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Abstract 

In modern information and documentation systems, the thesaurus 
has the important function of serving as a link between the 
author of a document and the information searcher. For this 
reason, the linguistic structure of thesauri should receive more 
attention than has been the case so far. In this study, the 
intermediate function of the thesaurus has led to an investiga- 
tion of the principles for structuring and representing informa- 
tion in such a way that it corresponds to the cognitive structure 
assumed to exist in the information searcher. The model of inves- 
tigation of cognitive representation is based on overt manifesta- 
tions of concepts and conceptual relations as they emerge in the 
abstract language of titles of scientific documents. On the 
basis of this language structure, an algorithm has been developed 
for coding the relevant concepts by means of prepositions. The 
relevance of the concepts is judged with respect to a schema 
model containing the main components Problem, Method and Goal, 
assumed to represent the basic components of research itself. 
When applied to scientific titles, the assumption is verified. 
The algorithm generates the concepts corresponding to the compo- 
nents, and assigns them to different data registers. An important 
result of the analysis is thus that the intermediate language 
structure of scientific titles displays a degree of abstractness 
suitable for automatic concept extraction, provided that an in- 
formation- or cognition-oriented approach is employed in order 
for the data registers to be properly interpreted. 

Key words: algorithmic coding, computational linguistics, concept 
recognition, empirical database, information science, representa- 
tion language, structural analysis, text processing, text repre- 
sentation. 
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1. PROBLEMS AND AIMS 






From the point of view of information technology, society today 
is characterized by a high level of development. New information 
and documentation systems (I & D systems) have been, and are be- 
ing, devised in many places around the world. But the effects of 
this development vary considerably. A case in point is the phe- 
nomenon which may be called "information frustration". Informa- 
tion, the first part of this compound, refers to the meaning an 
individual abstracts from data. Data are characterized by physi- 
cal existence in the sense that they can be counted, measured, 
and classified. The second part of the compound refers to a psy- 
chological phenomenon connected with the barriers that have 
emerged between the information searcher and the goal of the 
search, namely to get access to the information contained in the 
documents. 

One of the more serious barriers in this connection is the 
multiplicity of document descriptions, which are disseminated 
without the individual's having access to the documents them- 
selves. This circumstance creates an often well-grounded uncer- 
tainty about the real existence of a document. Another barrier 
is that advanced technical systems have been built up as a con- 
necting link between the information searcher and the information 
potentially available. The user is often afraid of the ma- 
chines, and he is not always aware of the logic on which the 
programs are based. Another complicating factor is that the 
"troublesome" routines are not free of charge? further, the sys- 
tems usually do not provide user-oriented tutoring functions. 
An example of a desirable guiding program is "Interactive con- 
sulting via natural language", developed by Shapiro & Kwasny 
(1975) . Large groups of potential users are in effect prevented 
access to such information as is absolutely necessary for the 
development and management of information. A third barrier is 
that as yet no computer programs have been developed which can 
handle document descriptions presented in several different 
languages. Such programs would be of great help in international 
cooperation concerning matters of I & D for different subject 
areas. 

For some time it has been claimed that modern society is char- 
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acterized by information overflow and information explosion 
(Price, 1963; Anderla, 1973). Such claims, however, cannot be 
fully accepted, since objections may be raised against Price's 
global methods and overgeneralizations (see Gilbert, 1978; B. 
Bierschenk, 1979). Moreover, several studies have shown that many 
professional categories requiring information, such as research- 
ers and teachers (B. Bierschenk, 1974; Jernryd, 1976; 1978), seem 
to be suffering from a considerable lack of information. 

To set things right, increased research on the basic mechanisms 
at work in information selection, storage and retrieval is re- 
quired. Increased communication is no solution to the problem. 
There is a need for greater insights into and a better understand- 
ing of how information can be made available, enabling the indi- 
vidual to adapt it to the cognitive structures characterizing the 
models which govern his search for information. In the present 
work the available information consists of so-called "non fugi- 
tive information", i, e. the kind of information that has been 
documented. A document may then be defined as 

a written or printed record, being the definite proof that 
information exists. 

Thus documentation means, among other things, the supplying of 
documents. Libraries traditionally supply documents. But when it 
comes to the computer-based I & D systems, the situation is dif- 
ferent. They traditionally supply only document descriptions or 
references to documents. Thus documentation has also come to im- 
ply the organization and representation of document descriptions. 

Against this background the concept of data base, in connection 
with information and documentation, may be defined as 

a set of data which is part of another set of data (docu- 
ments) and which consists of at least one register, which 
is organized in such a way that its structure is suited for 
a precise description of the documents of which the first 
set of data is a part. 

On the basis of this definition, register refers to a list of 
specified data, whose purpose is to describe a document. 

The possibility of retrieving meaningful information from a 
data base is determined by the way in which the documents are de- 
scribed and how these descriptions are structured in the regis- 
ters. The descriptions constitute representations of physical do- 
cuments. The basic problem for every modern I & D system is to 



find the formats of representation most adequate to its different 
goals, and to develop inference mechanisms in order to make for 
comprehension of stored information. 

One format of representation may be a purely bibliographic de- 
scription. Examples of bibliographic elements are "name of au- 
thor", "co-author", "title", or "name of journal". Another kind 
of representation may be "outward" characteristics, such as 
"colour of cover", "layout", "binding", or "general condition" 
of a document. A system may also need to discriminate between 
types of documents, i.e. books as distinct from non-book materi- 
al, a way of characterization which has caused trouble all over 
the world, partly because the borderlines between what should and 
what should not be regarded as books have become increasingly 
blurred due to new composing and copying techniques. 

All these ways of characterizing a document may be said to be 
descriptive. Many information systems are based on an organiza- 
tion of such descriptive data about documents, and the retrieval 
from a search in such data bases should, therefore, be called 
document retrieval. 

The use of the term "information retrieval" presupposes that 
the representation is a result of some form of analysis of what 
a document is intended to communicate. This process is referred 
to by many terms, some of which are "content analysis", "content 
detection" and "making judgement of aboutness" (see e.g. Fair- 
theme's discussion in the Annual Review of Information Science 
and Technology, 1969). 

A document description based on content analysis adds a cogni- 
tive dimension to the I & D system which, in turn, may have dif- 
ferent representational levels. The analysis may be represented 
by means of some keywords (or descriptors) being added to the do- 
cument description, relating the document to a certain conceptual 
structure. Keyword is a word or a term which is assigned to the 
document from the document itself for indexing purposes. A de- 
scriptor is a main term or phrase which, for the same purposes, 
is drawn from a thesaurus (see below) . In information and docu- 
mentation word is defined as being a string of characters, while 
term refers to a word or phrase designating a concept. (The defi- 
nitions in connection with this subject field are taken from 
Wersig & Neveling, 1976.) 



Both keywords and descriptors may be assigned to documents on 
the basis of title, list of contents or search in the text. Ex- 
amples of higher levels are abstracts , which are a kind of com- 
pressed informative or indicative summaries, and extracts , which 
here are to be regarded primarily as selections of representative 
paragraphs. These document descriptions are usually provided not 
by the author himself, but by someone else. 

In devising bibliographic descriptions it is convenient to 
follow some international standard, e.g. the American Psycholo- 
gical Association (APA) . In general, the design of a bibliogra- 
phic data base does not cause any difficulties when the level of 
structuring is low. Structuring based on content in documents, 
however, has raised many problems, since it is based on interpre- 
tation. The structuring principles within library science, for 
example, rely on philosophical thinking, as manifested in a clas- 
sification system. Information is then determined in such a way 
that the description of a document is adapted to the structure 
that characterizes the classification systems. Examples of such 
systems are the SAB (Sveriges Allmanna BiblioteksfSrening / The 
General Library Society of Sweden) and the UDC (Universal Decimal 
Classification). However, in connection with computer-based I & D 
systems means of assistance have been developed in the form of 
the kind of structured dictionaries that are usually called the- 
sauri. A thesaurus is characterized by an organized display of 
the relations which hold between terms and descriptors and which 
define these. No matter what organization form the designer of 
the system chooses, he has to see to it that information is made 
explicit. Thus the central problem of an information system is to 
find ways of representing information in such a way that the 
structure in which the author communicates it corresponds to the 
structure in which the reader perceives it or wishes it to be. 
Therefore, it is hardly possible to try to solve the so-called 
information problem without focusing on its cognitive aspects. 

This general introduction to the field of information and docu- 
mentation, terminology and computer-based systems and activities 
connected with it, contains the ideas which have governed the 
present attempt to tackle the information problem. To summarize, 
the aims of this study are: 



1 . To present some basic principles governing the systema- 

tization of information. 

These principles are outlined in Chapters 2 and 3. 

2. To present a general model for re-cognition based on such 
components in titles as are derived from research on cog- 
nition, and to develop an algorithm capable of coding 
titles in accordance with the model . 

3. To analyse and demonstrate the linguistic representation 
format of the model, and to show the extent to which reg- 
ularities in language can be used in the communication of 
scientific information within a particular field of appli- 
cation. 

This theoretical discussion is presented in Chapter 4. 

4. To map structural relations in the linguistic representa- 
tion format and to describe those structures quantita- 
tively, and also to analyse the relationship between 
structures and types of documents. 

The results of the structural analysis are given in Chap- 
ter 5. 



5. To analyse and describe the components" conceptual foun- 
dation in a data base, and to indicate intermediate lan- 
guage functions. 

6. To demonstrate the function of the registers in the data 
base and the relevance of their entries as a basis for a 
functionally oriented thesaurus in the field of educa- 
tion, 

The presentation of these results is to be found in Chap- 
ter 6. In the last Chapter some concluding remarks are 
made. 



2. STRUCTURING PRINCIPLES 



Communication processes are made possible by means of systems 
that are open with respect to information input. An information 
system designed to handle documents or document descriptions 
should have as its primary goal to provide an overview, as com- 
prehensive as possible, of incoming information. Its fundamental 
purpose, therefore, should be to create order. A universal con- 
ception of the creation of order has guided library systems in 
the past and still does. With the computerization of library 
science the possibility arose to automatically sort and organize 
bibliographic indexes and catalogues (e.g. author indexes) . Such 
so-called non-intellectual routines could be taken over by the 
machines. Difficulties arose, however, when it came to intellec- 
tual routines, such as indexing, i.e. the process that includes 
analysis and classification of documents. The need for automatic 
generation of subject indexes brought to the fore other than 
bibliographic problems. These circumstances, together with the 
highly increased output of scientific documents of different 
kinds, led to the establishment of a new discipline, namely in- 
formation science, which, besides library science, involves sys- 
tems theory, automatic text processing, linguistics, and computer 
science. A fairly good picture of the content of and development 
within this field is given in the Annual Review of Information 
Science and Technology, whose first issue appeared in 1966. 

The process of selection, storage and retrieval of documents 
and document descriptions at the bibliographic level is no longer 
a technical problem. This level of representation will therefore 
not be further considered. Instead, the intellectual routines 
that are related to analysis and description of document content 
for information retrieval will be focused on. 

There are authors who, from the standpoint of natural science, 
argue that progress is made through detection of facts or through 
new ideas and events (e.g. Price, 1963). From the point of view 
of social science however, it may be claimed that progress is to 
a greater extent made through new principles of organization, new 
theories, new relationships ("repacking of older information" 
according to Anderla, 1973, p 120). It would hardly be realistic 
to believe that an I & D system would be able to survey the total 



amount of information, especially since information is constantly 
changing. The adaptive properties of an I & D system, therefore, 
are reflected in its capacity to structure information in a flex- 
ible way. In the following section a short presentation of some 
principles for structuring of information will be given. 

2.1 Hi-eparchies 

For the purpose of collecting "all" documented information in 
libraries, the UDC (Universal Decimal Classification) was devel- 
oped for the organization of book stocks on shelves. The starting 
point was Dewey's Decimal Classification (DC) . The decimals are 
retained within UDC, and the literature is organized in ten main 
categories, designated by the numbers 0-9. The UDC differs from 
the DC by using more than one decimal. 

The task of classification systems is to define the relation- 
ships between the single elements. The hierarchical structure 
that characterizes the UDC is similar to a tree structure with 
strict super- and subordination. Hierarchy is defined as a strict 
organizational system. By means of the ten main classes related 
subjects are grouped together, even though a strict ranking order 
in the subclasses is maintained. Consequently, restructuring is 
only possible through addition and expansion along the outer 
edges of the tree. On the other hand, the UDC system may more 
easily than other library systems be used for retrieval purposes 
(see M^lgaard-Hansen, 1968). In order to illustrate the hierarchy 
as a structuring principle, Box 1 shows the structure of General 
Linguistics as presented in the Swedish version of the UDC system 
(UDK, 1961). 

General Linguistics is classified directly under main class 4, 
i.e. "Linguistics, Philology". Class 41 indicates that the sub- 
ject field is rather old. It may be compared with, e.g., Psycho- 
logy, which is classified in main class 1 under "Philosophy", 
although as number 159.9 in the tree. This means that it is a 
subject field that has been added to the original structure at a 
later stage. Psychology, however, contains six times as many 
branches as Linguistics. 
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Box 1, Example of hierarchy: General Linguistics as structured 
by the Swedish version of the UDC. 




41 General Linguistics 

411 Orthographic rules. Correct spelling. Orthography 
.4 Orthographic reform 

412 Word classes 

413 Lexicology. Dictionaries 

.1 Words according to meaning. Semasiology 
. 1 1 Place names 
.13 Names of persons 
. 1 4 Homonyms and synonyms 
.163 Foreign words. Loanwords 
.164 Professional terms. Technical terms 

.2 Dictionaries classified according to different aspects 
I 

414 Phonetics. Phonology 

415 Grammar 

.4 Etymology. Semantics. Semiotics 

.5 Morphology. Accidence. 

. 6 Syntax 

Metrics. Prosody 

Reference sciences. Hermeneutics. Exegesis. Textual 
criticism 



416 
417 

418 



Original sources of Linguistics 



The assignment of a document's place in a classification system 
depends on the preciseness of the indexing, i.e. the assignment 
of keywords or descriptors to the document. A way to avoid sub- 
jectivity in making decisions is to use a controlled terminology. 
Such a terminology for document description has been developed 
in computer-based I & D systems for the purpose of representing 
concepts and conceptual relations. Regardless of what principles 
may govern the structuring of a subject field, the structure em- 
ployed is represented in a thesaurus. The thesaurus is an aid in 
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both indexing and information search. The best-known thesaurus 
in the field of education is that developed by ERIC (Educational 
Resources Information Center) . The Thesaurus of ERIC Descriptors 
(1975) is in principle hierarchically structured. Three types of 
function terms are used for structuring the vocabulary. These 
types are, basically, USE ("see or use") and UF ("used for"); BT 
("broader term") and NT ("narrower term"); RT ("related term"). 
The terms in the first group have a controlling function. By USE 
is indicated which term is the more correct one (used by profes- 
sionals), whereas UF indicates which term has been used earlier 
in designating the same particular field. Thus the UF function 
admits retrospection, i.e. contact with the historical develop- 
ment is maintained. 

The descriptors are structured by means of the hierarchical 
relations BT and NT. The importance of these links is that the 
user can let the built-in hierarchies guide his search. The pos- 
sibility of relating different hierarchies is provided by the RT 
function. There is also a possibility of updating, i.e. addition- 
al descriptors may be assigned to the system, depicting the prog- 
ress of the subject field. 

2.2 Facets 

The concept of facet refers to an aspect of a document, a sub- 
ject, and so on. In an analysis of facets a complex subject field 
is decomposed into as many aspects as possible. One of the oldest 
philosophical classification systems is Ranganathan's "tree of 
knowledge", in which the world is described by means of the five 
facets "Space", "Matter", "Economy", "Time", and "Personality". 
Ranganathan's (1964) Colon Classification has given rise to 
several thesauri, one of the better-known of which is the Thesau- 
rofacet, which classifies electrotechnical engineering and relat- 
ed fields. This thesaurus is structured in "fundamental facets", 
"sub facets" and "hierarchies" (see Aitchison, 1970). 

One type of facet classification within the field of informa- 
tion and documentation is to be found in Terminology of Documen- 
tation (Wersig & Neveling, 1976), published by UNESCO. In compari- 
son with the ERIC Thesaurus, it may be noted that the Terminology 
of Documentation uses the same function terms, i.e. BT, NT and RT, 
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according to the same principles as in ERIC. There is also a 
function named OT ("opposite term"). Since this thesaurus is not 
linked to an information search system (data base), it has no 
reference terms. The Terminology of Documentation is an attempt 
to standardize terminology within the I & D field in the English, 
German, French, Spanish and Russian languages. For the classifi- 
cation three aims have been deemed important (p 12), namely (1) 
to connect terms from a certain area of a given subject field, 
e.g. terms relating to punch card systems; (2) to connect terms 
belonging to the same facet of a given subject field, e.g. terms 
denoting special systems; and (3) to avoid an excessive number of 
terms in each class. Besides facet classifications, definitions 
are also included. The terms (60 per group, at most) are placed 
under five faceted main headings: (1) "Basic aspects of informa- 
tion and documentation", (2) "Documents", (3) "The activities in 
information and documentation", (4) "Systems in information and 
documentation", and (5) "Organizations and professions in infor- 
mation and documentation" . 

For the purpose of connecting this presentation with the field 
of education, the facet principle will be illustrated with an ex- 
ample from the EUDISED's (1973) Multilingual Thesaurus, which has 
a fully realized, albeit crude, facet classification and which, 
therefore, constitutes further development of the ERIC Thesaurus. 
The EUDISED Thesaurus is divided into 20 main facets. One of them 
is called "Documentation" and consists of two subfacets, namely 
"Information, Service" and "Index, Bibliography". Under the lat- 
ter are ordered eight facets, the second of which contains ten 
subfacets from which "Thesaurus" is chosen as an example (Box 2). 

"Thesaurus" is, among other things, related to "vocabulary". 
A search for this term leads to the main facet "Literacy". The 
terms related to "Vocabulary" show some new terms compared with 
the "Thesaurus" facet, i.e. some variants of the term "Word" 

("Word frequency", "Word list"). This is an example of the pos- 
sibility within the faceted structure of relating terms horison- 
tally compared with vertical relations expressed by hierarchies 

(Box 1 ) . 
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Box 2. Example of a facet: "Thesaurus" from the EUDISED 
classification 



Thesaurus 


Vocabulary 


BT: Reference material 






RT: Dictionary 


RT: 


Lexicology 


Lexicology 




Terminology 


Semantics 




Thesaurus 


Terminology 




Word 


Vocabulary 




Word frequency 






Word list 



BT = Broader term, RT = Related term 



Artandi (1970) provides a summary of research and theories in 
classification. She exemplifies a classification of the behav- 
ioural sciences by a proposal made by Altmann & Riessler: "Unit 
of study", "Dynamic-static properties of units", "Energy", 
"Transformation processes", "Intensity", "Distribution in time", 
"Distribution in space", and "Ecological setting". 

The best-known example of language facets is probably Roget's 
Thesaurus of English Words and Phrases, which was first published 
more than a century ago. It is structured according to six main 
facets: "Abstract relations", "Space", "Matter", "Intellect", 
"Volition", and "Affections". These in turn, are subdivided into 
about twenty subfacets (see Browning, 1971). 

Different subject fields show different facet structure. More- 
over, one and the same field may be differently faceted, depend- 
ing on the classifier's view of the world. In spite of such ob- 
vious difficulties in the work on a universal classification 
system, the search for such a system has not ceased in library 
science (cf , Richmond, 1972) . 
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2.3 Structural relations 

Classification systems, whether structured in hierarchies or in 
facets, may be regarded as semantic in the sense that they orga- 
nize concepts according to generic relationships, synonymy, an- 
tonymy, and so on. The concepts refer to relations between words 
considered independently of context. In this connection, there- 
fore, it would be possible to distinguish between semantic and 
structural relations, where structural means context-dependent. 

Analysing the content of document on the basis of its struc- 
tural relations entails the advantage that the choice of keywords 
is restricted to a system of rules which may, to a great extent, 
increase the reliability between indexers in the document descrip- 
tion process. The most obvious advantage, however, is that the 
document's own structuring of its content is represented. A clas- 
sification is deductively imposed on a document, while analysis 
of structure admits an inductive generation of concepts by which 
their meaning is defined through their structural relations. This 
type of analysis is also known as "concept analysis". 

One of the better-known adaptations of concept analysis in infor- 
mation science is the PRECIS system (Pi?Eserved Context Jndex Sys- 
tem) . It has been used at the British National Bibliography (BNB) 
since 1971 (Wellisch, 1977). During 1978 it was introduced in 
Swedish libraries. The different principles characterizing PRECIS 
(cf. Austin, 1977) may be outlined as follows. 

A typical PRECIS entry consists of a "heading" composed of a 
"lead" and its "qualifier" plus a "display". Cross references are 
possible in that more than one concept in an entry may become a 
"lead". This is done through a rotation mechanism called "shunt- 
ing" . The structural relationships are preserved by a numerical 
and alphabetical code system. These operators define roles and 
links at different levels. The first level places the document in 
a context (e.g. geographic area) and determines to which observed 
system (subject field) the concept belongs. The second level 
accounts for the relevant data. This is followed by syntactic and 
semantic codes, which serve to localize the positions of the con- 
cepts, e.g. in a title, and which also function as keys to the 
original structure. 

In Box 3 the main characteristics of indexing and shunting 
according to the PRECIS system are presented. 
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Boss 3. Examples of structural relations: PRECIS indexing 



index string; 

(0) United States 

(1) aircraft industries 
(p) personnel $h unskilled 

(2) training $i in-service 

$v by $w of 

(3) foremen 



Role operators; 

(0) Location 

(1) Key system 
(p) Part/property 

(2) Action 

(3) Agent 

$h non-lead direct difference 

$i lead direct difference 

$v downward reading 

$w upward reading 



Entries : 



Aircraft industries. United States. 

Unskilled personnel. In-service training by foremen. 

Personnel, Aircraft industries. United States 

Unskilled personnel. In-service training by foremen. 

Training. Unskilled personnel. Aircraft industries. United 
States. In-service training by foremen. 

In-service training. Unskilled personnel. Aircraft industries 
United States. 
By foremen. 









The operators indicate that the system is to a large extent based 
on "roles". An indication of transitivity is made explicit 
through the $v and $w operators. "$v by" thus means that the 
agent is to be found "downwards" (foremen} . A good deal of think- 
ing in terms of a classification scheme remains, however, proba- 
bly due to the fact that the system was developed mainly for li- 
braries (the processing of cards). Instead of "personnel" being 
categorized as "object" in the above example it is represented as 
a part or a property. But thinking in terms of roles has contri- 
buted to document analysis in that the aims of a document, which 
are not always made explicit in a title, can be (manually) in- 
dexed. The example given in Box 3 would probably be the result 
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of an indexing of (in the most explicit case) a title like "In- 
service training of unskilled personnel in the American aircraft 
industries". The agent "foremen" has been retrieved from some 
other place in the document. 



2.4 Networks 

The structures described as networks in connection with informa- 
tion science may be regarded as relational systems based on asso- 
ciations. With respect to the associationistic principle the net- 
works have a psycholinguistic foundation. For the purpose of de- 
veloping models for storing information (memory structure) sever- 
al simulation programs have been constructed. Attempts have been 
made to build in a capacity for answering questions, asked from 
the point of view of one frame of reference, by using information 
from another frame of reference. Each frame of reference is a net- 
work connected to other frames through "associative links". 
Quillian's (1968) suggestion for the structuring and representa- 
tion of "semantic memory" is usually considered the source of all 
further progress within this field. Some of his followers deserv- 
ing mention are Woods (1973), who has worked on "transition net- 
works", Simmons (1973) as a representative of the "question an- 
swering" field, and Schank (197.2), whose main contribution to 
"natural language understanding" is his analysis of "conceptual 
dependency". The various theoretical views are to be found prima- 
rily in the field of cognitive psychology. Another field that has 
focused on the development of a theoretical framework is artifi- 
cial intelligence (AI) . The main goal of AI is to develop mecha- 
nisms for logical deduction, i.e. the application of rules of 
inference to statements made in a formal language, whose seman- 
tics is well specified. These branches of science are mainly 
concerned with the development and study of organizations of mem- 
ory. One attempt in this direction is an organization in the form 
of networks. (A more thorough discussion of this is presented in 
Chapter 3.) The network as a structuring principle is illustrated 
in Figure 1 . 
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I 

C*: Concept of primary order 
C : Concept of higher order 



E.. . . E : Extensions (Attributes) 



Hgure 1. Example of a network structure 



A network structure has the following characteristics. The repre- 
sentation is based on the meaning of the concepts and not on the 
explicit syntactic order between them. Quillian's memory model 
consists of nodes which are interconnected via associative links. 
Each node may be regarded as a concept which has been given a 
name (label) . The node is an entry whose meaning is specified by 
the links. The direction of the links is dependent on the rela- 
tionships between the concepts in a hierarchic system (directed 
graph) . This system makes use of the language to specify differ- 
ent degrees of abstraction at which the concept of primary order 
defines the concept of higher order, and so on. Links from one 
subject label lead to the properties (attributes) that form the 
basis for the object label (extension) . From the object labels 
links are then directed to the concept of primary order that was 
designated by the object label. The concepts of primary order 
provide the basis for the formation of concepts of higher order. 
Thus the networks are associative because they consider the 
relationship between concepts and attribute lists and between one 
attribute list and another. 
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There are many concrete proposals concerning the network model 
in the literature, one of which is given in Lindsay & Norman 
(1972, p 409). For example, the system should be able to answer 
the question "Does a canary breathe?" A search in the memory for 
the information needed to answer that question takes varying 
amounts of time, depending on the level of abstraction at which 
the information required is stored. If the label "canary" is not 
immediately associated with the property "to breath", it means that 
the label and the property are to be found at different levels in 
the network. "To breathe" is a general property characterizing a 
class of which "canary" is the label of a member. "Bird" is the 
primary concept, having, e.g., the properties "to fly", "feathers" 
and "wings". If a canary is found to be a "bird", this informa- 
tion implies that it is also an "animal", since "bird" is. "Ani- 
mal" has the property "to breathe", which means that the answer 
to the question "Does a canary breathe?" is "yes". The time 
needed to answer that question is longer than would have been the 
case if the question had been worded "Is a canary yellow?" The 
information in the memory is stored at the same node, since 
"yellow" is a concrete description of the concrete object label- 
led "canary". 

This kind of network operates synthetically. The process goes 
from bottom to top. The associative functioning implies, among 
other things, context-independent activation of the nodes. The 
goal of network researchers is to find out how the network 
should be structured and how many "semantic primitives" are re- 
quired to include all the information to be represented. This 
means that the networks store an explicit structure between prop- 
erties. But what this type of structure does not represent are 
hierarchical relations between concepts. Thus one cannot deter- 
mine what kind of inferences can be drawn about the concepts in 
the networks. 



2.5 Schema 

By the mid 1970's the associative memory models began to be re- 
placed by a structural approach, deriving from Bartlett's (1932) 
suggestion that the remembering process is schematic. In order to 
clarify the term "schema", some basic statements from Cofer"s 
(1976) presentation of research on memory capacity will be summa- 
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rized here. 

The human capacity for remembering information depends on our 
capacity for coding it. These capabilities are most likely indi- 
vidual, based on the individual's strategies for processing in- 
formation. The individual uses language in order to construct 
propositions which describe events that he has observed. The 
information expressed by these propositions is coded and stored 
in long-term memory. The observed events constitute experiences, 
so when one talks about different "worlds of experience", this 
indicates the assumption that individuals have acquired different 
representations of propositions. Such a set of propositions may 
be characterized by means of a schema. Thus a schema is a struc- 
tural model in which the components have certain specific rela- 
tionships to each other. Schemas require context. The coding and 
processing of an input, therefore, eliminate the need for activa- 
tion "from the bottom" to attain a meaningful information struc- 
ture, as is required in the network model. Instead, the schema 
model operates on an adaptive basis. The structure of a schema is 
the result of abstractions, i.e. a generalizable pattern, which 
excludes other patterns when activated in the information analy- 
sis process. 

Several experiments on memory structure have shown that the 
syntactic form (the manifest level) of a sentence does not have 
any crucial importance for retention (see Greene, 1977). It is 
the semantic relations that are retained. These relations seem to 
be selected and transformed according to a model characterized by 
a "role" or a "case" structure (the latent level) . Furthermore, 
Kintsch (1974) has pointed out that the verb determines the ex- 
tent to which sentences are confused in a so-called "recognition 
experiment", a result which supports Wearing" s (1972) and Reid's 
(1974) argument that the verbs are only indirectly represented at 
a latent level . 

A model for the analysis of latent structures has been develop- 
ed and tested (Bierschenk & Bierschenk, 1976; B. Bierschenk, 
1977). It is context-oriented and based on the assumption that 
the manifest level can be used for the construction of a schema 
suited for the analysis of latent dimensions. In this model, too, 
the function of the verb as an organizer of concepts and concep- 
tual relations (abstractions) is of crucial importance. 
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A representation of information based on the schema principle 
is rational and saves memory space. As opposed to the explicit 
network structure, the schema structure utilizes abstractions, 
i.e. symbols and relations between symbols. Thus information can 
be procedurally embedded within the structure of a schema. In- 
stead of concentrating on how information has to be re-structured 
at every question, a schema model attempts to find out what 
should be activated in order for the system to be able to give an 
adequate answer. The schema model is thus based on heterhierar- 
chical functioning, which implies utilization of cues in lower 
domains in order to signal the activation of a certain component 
of the schema, which can then be applied to the data. Moreover, 
the model has psychological relevance concerning both "recogni- 
tion" and "recall". 

The model employed to illustrate the schema principle is the 
Agent-action-Object model, AaO, (for a description, see Chapter 
4). The three components of this model can be used for action- 
oriented schematic representation. If, for example, the aim is to 
study what agents act through given actions towards given goals, 
this may be schematically represented as in Figure 2. 



cD 



act ion 







write 



Figure 2. Example of an action schema 



It can be seen that there are two place holders, so-called de- 
fault variables. What values these default variables will assume 
when the components are activated depends on what can be realized 
by the action "to write". For example, the question mark to the 
left may be replaced by "researcher" and the one to the right by 
"reports". 

A representation of the values manifested here ("The researcher 
writes reports") would, e.g. from the point of view of a sentence 
schema, be made in terms of Subject-predicate-Object and the 
model would, instead of AaO, be called the SvO or the N 1 vN 2 model. 
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Another schema may be: 
author, title, place of publication: publisher, date of publication, 



•This schema is regarded as an international standard in the ci- 
tation of scientific references. Here there are at least five com- 
ponents that must assume bibliographic "values". Unless all compo- 
nents are activated, the reference is considered incomplete. 

To summarize, the characteristic feature of a schema is that its 
components are always present. They must, however, be provided 
with variables, which means that a schema is operationalized. But 
the single values of the variables are dependent on the context 
within which the schema' is to be activated. 
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3. MODELS OF REPRESENTATION 

Information can be represented with different degrees of complex- 
ity. The representation of a given type of information can be 
different for different purposes, which, among other things, is 
indicated by the type of document in which it is presented (re- 
search reports, journal articles, handbooks, etc.) Through trans- 
formations information can be re-structured. Transformations are 
part of all intellectual activity, constituting a cognitive pro- 
cess which can manifest itself in a language structure. Thus 
each transformation implies a change of the language structure 
and, consequently, there is a connection between level of com- 
plexity and language structure. The author of a scientific report 
may transform the text himself to make it appropriate as a jour- 
nal article, or he may write an abstract of it using no more 
than a hundred words. The more the text is compressed, the more 
abstract relations it contains per unit of space. From this 
point of view, the title may be regarded as the most compressed 
statement about the content of a work. Moreover, in an I & D 
system the representation of information relies heavily on the 
fact that someone (often called a documentalist) other than the 
author himself performs the transformations required for docu- 
ment descriptions for different purposes, e.g. writing abstracts 
or indexing through the assignment of some characterizing words 
(keywords, descriptors), often based on interpretation of the 
title alone. As a rule, such descriptions form the sole basis 
for communication between the author of the document and the in- 
formation searcher. Document descriptions, therefore, are "sur- 
faces of communication", whose language has the most central 
function in this kind of communication. 



3.1 Intermediate languages 

Language may, very broadly, be described as a means of communica- 
tion. In this respect, natural and artificial languages do not 
differ. Their communicative capacity and function, however, 
differ according to the degree of formalization employed. Gener- 
ally speaking, in relation to artificial languages, natural 
language is characterized by an abundance of variations and al- 
ternative interpretations, necessary for its function as a means 
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of communication between human beings. A greater degree of for- 
malization makes for greater precision and clarity , thus re- 
stricting the number of possible interpretations. An artificial 
language is characterized by standardization of vocabulary and 
rules, whose meaning must be unambiguously definable. Thus high- 
level languages (FORTRAN, Algol, LISP, etc.) display amore elab- 
orated treatment of semantics and are based on a strictly for- 
malized logic, i.e. rules for manipulating statements. The direct 
opposite to this may be exemplified by the language used in natu- 
ral discourse. 

When concepts and conceptual relations are to be translated 
from a natural language into an artificial one, difficulties will 
arise owing to just that sharpness of definitions that has to 
replace several different interpretations possible in natural 
language expressions. Such transformations are processed at dif- 
ferent levels (with more or less formal and explicitly stated 
changes), which does not make it easy to define the borderline 
where a natural language becomes artificial. 

What ought to be focused on, however, is the mechanism that 
underlies the transition process. This mechanism forms the basis 
of the languages that have been developed in order to make pos- 
sible access to documents, i.e. documentary languages (concerning 
this term, see Wersig & Neveling, 1976, p 67). These languages 
are in principle as numerous as are information systems. Docu- 
mentary languages, such as PRECIS (see Chapter 2.3), have a 
variety of structures, implying that the differences are compara- 
ble to those to be found in natural languages ; in. a sense, they 
are an abstract reflection of them. The differences in structure 
apply to documents (direct description) as well as to descrip- 
tions of documents (indirect description) (Coyaud, 1966, p 127). 

A language must have a lexicon and a set of rules. In spite of 
many attemps to formulate abstracts of varying length for I & D 
systems, it has not been possible to formulate rules specifying 
how such paraphrasing of an original document should be done. 
To be sure, an abstract describes a document, but the language 
employed cannot be termed a documentary language. Furthermore, 
the abstract is, besides the full text of a document, the only 
type of description that consists of complete sentences. The next 
higher level of description would be the title, which, in general, 
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has a reduced syntax and which could be regarded as the beginning 
of an artificial language. In addition, the title is the level of 
description most frequently drawn on in determining a document's 
content and in the generation of descriptive terms, both manually 
and automatically. Therefore, the title is the communicative 
interface between the document and the indexer, having a key 
function as the last "station" before the content is transferred 
to new media. 

Irrespective of how many stages of abstraction are used in the 
transformation of document content, these descriptions may be 
considered variants of the language used within one and the same 
type of medium. Similarly, other media have their own language 
variants, e.g. the computer languages. The linkage between docu- 
ment and computer in a computerized I & D system is performed 
through some kind of communication, which starts at the title 
level and is connected to the "surface" of the computer, i.e. to 
a symbolic language. This language is then transformed into a 
machine language, which is the internal language of the computer 
medium. Thus communication between media is also performed 
through languages (with a lexicon and systems of rules), which 
consequently may be called ■intermediate. The intermediate func- 
tion refers to a medium between languages as well as to a medium 
between the author of a document and the information searcher 
(cf. Coyaud" s term "langage intermedia ire" in Coyaud, 1966, pp 
18-19). The structures of those kinds of language are usually 
represented in a thesaurus. 

As mentioned in Chapter 2, the thesaurus supplies descriptors 
for indexing documents. The concept of indexing is defined in 
Coyaud & Siot-Decauville (1967, p 40) as 

"la traduction de documents ecrits vers leur 
representation dans un langage documentaire. " 

Indexing is a process which starts with recognition and identifi- 
cation of content, followed by definition by means of a state- 
ment ("This report is about..."). This statement is then repre- 
sented by index terms, classification number, or descriptors 
taken from a thesaurus. 

The main problem with such "informal interpretation of a docu- 
ment" (impressionistic content analysis) is that explicit de- 
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scription and operational ization of the interpretation process 
cannot be achieved (see Sparck Jones & Kay, 1973, p 18). The 
cognitive model used by the indexer is usually unknown, even to 
the indexer himself (Robinson, 1977, p 170), which means that 
errors made in the representation of facts cannot be controlled. 
Since the transformation functions have not been explicitly for- 
mulated, there is no telling in what respect a document's repre- 
sentation form differs from its original form. 

In document description both librarians and information scien- 
tists and linguists talk about "indexing language" (see e.g. 
Sharp, 1967; Sparck Jones & Kay, 1973; Soergel, 1974). As men- 
tioned above, indexing is a cognitive process, whose "language" 
has not been made explicit; further, it cannot without difficul- 
ties be made explicit. The control mechanisms available are the 
vocabulary in the subject index, the classification scheme and 
the thesaurus. The indexer makes use of these means in describing 
a document. He is the channel f while the means of control are the 
media. As such they function as variants of an intermediate 
language. Language has here been defined as a "means of communi- 
cation" having a lexicon and rules. Since it is possible to talk 
about more or less communicative variants of language, the commu- 
nicative ability of a language may be defined through the struc- 
ture that characterizes each set of lexicon (vocabulary) and 
rules (grammar) . The structure of an intermediate language is 
represented through the thesaurus. The standardized terminology 
in the field of information and documentation (Wersig & Neveling, 
1975, p 118) defines thesaurus as 

"A controlled and dynamic documentary language 
containing semantically and generically related 
terms, which comprehensively cover a specific 
domain of knowledge." 

Although a thesaurus is defined as a 'language in the passage just 
quoted, only the structure of the vocabulary is emphasized. Its 
dynamic properties, however, equally are important, especially 
in computer-based I & D systems, where the rules stating how the 
terms could and should be combined for search and retrieval have 
to be made explicit. 

The above discussion, it is hoped, will have made it clear that 
a presentation of intermediate languages within I & D should in- 
volve a description of the organization and function of thesauri 
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and other more or less structured vocabularies in relation to the 
role they ought to play in the communication between document and 
information searcher. 

Since the use of a thesaurus implies active and explicit as- 
signment of descriptors to documents, no attention will be paid 
to classification systems (see Chapter 2) in the subsequent pre- 
sentation. 






3.2 The thesaurus as a means of communication 

A thesaurus, according to the above outline, will here be regard- 
ed as a language, whose purpose is to make communication possi- 
ble, primarily in computer-based I & D systems. From the litera- 
ture concerning the construction of information systems it is 
evident that there is a certain confusion as regards terminology 
in the field, which several authors (e.g. Fairthorne, 1969) have 
noted as a typical feature. Many of the statements made and po- 
sitions taken are no doubt due to the fact that the field is a 
relatively new branch of science (having existed for no longer 
than approximately 20 years) , and that people working within this 
area represent different traditions. The borderlines between the 
disciplines involved are vague. At the same time an integration 
between, e.g., library science, on the one hand, and general and 
computational linguistics, on the other, would be valuable. One 
result of this line of thought is Sparck Jones & Kay's (1973) 
attempt to show the extent to which linguistics and information 
science were actually integrated. But neither at that time (1973) 
nor in a more recent survey (1977) do the two authors seem to 
have recognized the key role played by the thesaurus as a lin- 
guistic phenomenon. They state (1973, p 46) that the most impor- 
tant linguistic interests lie 

"1. in the treatment of the text of the document, 

2. in the formulation of the description text, 

3. in the treatment of the description text." 

The linguistically interesting things that are built into the 
thesaurus, making possible the formulation and treatment referred 
to, are not discussed. There seems to be a tendency to focus on 
indexing as an activity in itself instead of on the medium on 
which it is dependent. Karlgren (1977) makes an attempt to dis- 
cuss the positions taken by Sparck Jones & Kay but, unfortunately, 
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his position is not very precise either. For example, it is not 
clear if the author makes a difference between a language for 
description and on for retrieval. However, Karlgren makes an im- 
portant point in stating that the retrieval process, as it func- 
tions today, is not treated as a linguistic problem, since re- 
trieval is usually the result of a matching. A similar distinc- 
tion between linguistic and non-linguistic means of characteriz- 
ing methods for automatic document analysis is made by Coyaud & 
Siot-Decauville (1967). These distinctions refer to the process 
itself and the linguistic prerequisites have not been considered, 

The following presentation is based on the statement that the 
success of the search and retrieval process depends on the organ- 
ization of the thesaurus. Further, it is suggested that "lin- 
guistic" and "non-linguistic" are inappropriate concepts in the 
present context. It would be more adequate to discuss these re- 
lationships along a continuum. This implies that varying degrees 
of structuring are proposed to exist in intermediate languages 
as well as in others. A language with a low degree of structur- 
ing is supposed to have a weak communication capacity. These 
relationships are sketched in Figure 3. 



Conception 




Termi nol og y 



k'igure 3. A model for studying the degree of structuring in 
thesauri 



The degree of structuring may be expressed in three dimensions. 
The first dimension concerns the terminology , which refers to the 
selection and organization of the relevant terms to be incorpo- 
rated into the subject field. The syntaotia dimension refers to 
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the structural relations that are made explicit within and be- 
tween terms. The conceptual dimension refers to the concepts and 
conceptual relations formed, which, in a title, represent the 
content of a document. Accordingly, content is defined on the 
basis of the conceptual model employed in the analysis of the 
verbal expressions under consideration (see e.g. Osgood et al . , 
1957; Krippendorff , 1969; B. Bierschenk, 1978a). The representa- 
tion of content may be based on, e.g., the framework of the sub- 
ject, the theory or the psychology of science, or from combina- 
tions of these aspects. (Cf. the discussion on schemata in Chap- 

This model for the study of the field is conceived differently 
from what is usually found in the literature, and so no directly 
relevant references can be given. However, for a general outline 
of the various developmental stages within information science, 
the reader is referred to published volumes of the Annual Review 
of Information Science and Technology, in particular to the chap- 
ters that deal with "Automated Language Processing", "Content 
Analysis, Specification and Control", and "Document Description 
and Representation". Impressively analytic work concerning the 
general orientation in documentary languages and their structure, 
as compared to natural languages, has been done by Coyaud (1966). 
But one has to keep in mind that his work should be judged in the 
light of research on automatic translation, where, in a very deep 
sense, it represents an attempt to find universal features in 
documentary languages . 

A general frame of reference can also be obtained from B. 
Bierschenk (1973; 1974a) and Salton (1971), concerning informa- 
tion systems and their way of functioning. Furthermore, thesauri 
themselves often give quite good and concrete descriptions of 
their usage. Some of them have been discussed here. 

By and large, statistical methods will be kept out of the 
present discussion. They are primarily used for the estimation 
of significant words in attribution, and for the generation of 
frequency dictionaries, etc., based on abstracts or full texts. 



3.3 Degree of structuring in thesauri 

An organized terminology with the lowest degree of structuring 
may be said to be an alphabetical list of terms, selected as 



significant for a certain field of information. In the infancy of 
information science such a term index was the only control tool 
in indexing. Indexing problems grew in time along with growing 
fields of information. Expanded subject areas required expanding 
terminology for their description. At the same time the develop- 
ment of the computer-based I & D systems entailed , as a con- 
sequence , the necessity of a greater degree of formalization in 
the intermediate language than before. In the late 1950"s and 
early 196Q"s various studies of the indexing process were per- 
formed. Among other things, they pointed to problems as regards 
the consistency and selection of terms (see Annual Review of In- 
formation Science and Technology, 1967, Chapter 4). In this con- 
nection a discussion started about one-word and multi-word de- 
scriptors and relations between index terms { "coordinating in- 
dexing") (see e.g. Thesaurus of ERIC Descriptors, 1975, p XIX). 
In order to solve these problems, computers became tools for 
automatic extraction of index terms, mainly from titles of scien- 
tific documents. Thus, two techniques for the generation of ter- 
minology can be distinguished: (1) a manual technique, involving 
the use of controlled word lists, and (2) an automatic, uncon- 
trolled technique. Investigations of automatic generation showed 
that the best descriptor was a two-word term, but also that the 
terminology became too bulky and unwieldy, due to, among other 
things, variations in the words denoting one and the same pheno- 
menon. These experiences led to techniques for controlling the 
size of the vocabulary. Rules were needed which stated explicite- 
ly which terms and term combinations were important and should 
be allowed from the point of view of subject description, and 
which were possible from the point of view of language structure. 
By the late 1950"s the so-called KWIC indexes (Key itords In Con- 
text) were created for the selection of terms. Each word in a 
title could become an index entry, except certain structural 
words, called separators. The group of words between one separa- 
tor and another contained keywords and their context. 

Now, when it came to creating an intermediate language which 
could be represented in the form of a thesaurus, it was not only 
the structure of the subject (see Chapter 2) or lexicographic 
aspects that were to be considered. Of importance were also var- 
ious aspects related to the system itself. One of them was the 
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functioning of the terminology in automatic identification of 
document descriptions by matching against descriptors. Another 
aspect was automatic indexing for content analysis (e.g. Stone 
et al., 1966) of different document texts by matching against 
terms in natural language. It became necessary to have the terms 
normalized, which meant representing them in the shape of base 
forms. Programs for automatic suffix elimination were developed, 
and stop lists determined which words should not be regarded as 
terms. Another thing required in automatic identification was a 
method for standardization of universal linguistic components, 
able to "pick up" terms with identical morphological structure. 
The method is called truncation. Outputs from searches with trun- 
cated terms make it evident that it is a delicate problem to 
identify relevant verbal constructions, at the same time avoiding 
irrelevant ones (see B. Bierschenk, 1973, Chapter 7). A thesaurus 
that relies only on a terminology, no matter how it might be se- 
lected and organized, has a very low degree of structuring. Nor- 
malization and standardization do not solve, e.g., problems of 
homography or synonymy. Further, a term extracted from a KWIC 
index to be incorporated into a thesaurus loses its context when 
its lexical meaning has been determined. 

The most obvious lack of communication concerning terminologi- 
cally based thesauri is that the meanings of the terms are un- 
known. This has frustrating consequences for the information 
search. The matching process is only based on identity (same pat- 
tern) or partial identity (e.g. in truncation of word stems), 
i.e. on a coincidence of characters. Boolean algebra is used to 
differentiate between the existence or non-existence of terms, 
but structural relations are beyond its capacity. 

Thus the lowest degree of structuring is to be found in automa- 
tically generated term indexes, which are not based on explicitly 
formulated cognitive models but may have an explicitly described 
syntax. The ERIC Thesaurus has tried to make use of the KWIC 
technique for terminological control through its "Rotated Descrip- 
tor Display", in which the allowed combinations of the descrip- 
tors included are given. But this explicit syntax does not address 
the problem of cognition. 

Manually generated thesauri are based on implicit cognitive 
models, and the structuring of the terminology is performed 
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through an implicit syntax. In order not to lose the indexers" 
and the subject experts' knowledge of the structural relations 
holding between the terms, a system of rules was introduced, re- 
lating the terms to each other (the "related term" symbol)* The 
relations expressed concern synonymy derived from the conception 
of the subject field, while the implicit syntax is restricted to 
denote the co-existence of the terms with regard to a certain 
aspect of a subject field. The language structure is a noun 
phrase consisting of a main concept and its attribute, mostly in 
two-word combinations. The search logic connects such a noun 
phrase with another or defines an intersection which allows part 
of the two phrases to coincide. 

An example of the utilization of an explicitly described syntax 
is the automatic analysis aided by phrases as presented by Hill- 
man & Kasarda (1969), in which explicit syntactic relations in 
the form of fixed compounds can be extracted from documents and 
also be retrieved through matching of a phrase in a search query. 
However, a problem in automatic syntactic analysis is that many 
phrases relevant to document description are not matched because 
of different embeddings in natural language. Further, the differ- 
ence between a phrase and a multi-word term is slight. The syntac- 
tic analyses performed (e.g. Salton, 1962; Klein & Simmons, 1963) 
utilize so-called function words to demarcate phrases. In scien- 
tific texts it can be assumed that the context between them would 
be of restricted length. Therefore, it is probably a correct 
statement by Salton (1968) that simpler methods of analysis, i.e. 
such as are not syntax-based, would give at least as good results 
in document retrieval, since none of the methods can detect con- 
ceptual relations. 

Questionable results obtained from automatic techniques for 
term extraction and experiences of low reliability values in man- 
ual indexing and syntactic analysis could provide the appropriate 
reason for the appreciation of the PRECIS system by library 
scientists and librarians. The system together with its applica- 
tions is demonstrated in Wellisch (1977). 

The general relational system in the PRECIS thesaurus contains 
equivalence, hierarchical relations and associative relations. 
Austin ( 1977, p 3) claims that the system relies heavily on 
linguistic principles. To a large extent however, it seems to 
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draw on library routines. Coding (indexing) is performed manually, 
but the "shunting" technique is computerized. This technique for 
producing index entries resembles that of KWIC, although the num- 
ber of KWIC entries depends on the number of terms in a title. 
The PRECIS manual requires certain roles to be specified in the 
syntax-based relational system called "concept analysis". This 
term has been chosen because into the system has been built a de- 
pendency structure which, besides indicating notions like agent, 
action, etc., also defines attributive dependencies. The noun 
phrase as a block is kept apart from the transitivity relation 
that is assigned to the action term in the form of "by" or "of" 
("downward reading component" and "upward reading component", 
respectively) . 

The attention paid to PRECIS indexing may be seen as an indica- 
tion of a phenomenon that Sparck Jones and Kay seem to be some- 
what surprised at, namely that the linguistic theories developed 
and tested in the late 1960"s and early 1970's have exerted such 
limited influence on document description in information science. 
The libraries developed their own systems, among other things be- 
cause of difficulties in integrating new methods into existing 
systems. By the time the ASLIB-Cranfield, the SMART and the MED- 
LARS projects were running, linguistic theory was central to lan- 
guage research and discussion. This conclusion can be drawn from 
surveys in the Annual Review of Information Science and Technolo- 
gy between 1967 and 1970. (Compare also Sharp, 1967, with Bobrow 
et al., 1967.) Sager (1977, p 76) states that linguists are prima- 
rily working with theories for sentence generation, while informa- 
tion scientists are primarily dealing with recognition. It seems 
that a closer connection between linguistics and information sci- 
ence was not established until Fillmore (1968) appeared, probably 
due to the indications of cognitive features in his case theory. 

The real purpose of PRECIS as a document description system is 
not quite clear, perhaps owing to confusing mixture of viewpoints 
from library science and linguistics. Undoubtedly, however, PRECIS 
is a step in the right direction, due to its use of syntax for 
coding dependency structures. 

A step in the development towards a higher degree of structur- 
ing is the facet-based thesaurus, which makes it possible to con- 
nect aspects of a subject field with a certain specific relation- 
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ship to each other (entities and their properties, substances and 
their reactions, etc.). This can be expressed both syntactically 
and logically. One example of an intermediate language character- 
ized by an explicit syntax and explicit philosophical relations 
is SYNTOL (SIA'tagmatic Organisation Language), described by 
Coyaud (1966). A SYNTOL analysis of a text can be performed at 
several levels, e.g. at a morphemic and a syntagmatic level. The 
morphemes are analysed both analytically and synthetically. The 
analytic relations are made up of four "formal" classes, i.e. 
they are of the philosophical-logical type ("Predicats", "En- 
tit§s", "Actions", "Stats") . The synthetic relations relate the 
morphemes dynamically or statically, with the aid of syntactic 
conditions. 

A SYNTOL syntagm constitutes a representation of a factual con- 
dition. It consists of two lexemes, whose syntactic relations to 
one another are explicit. An assertion paradigm ("enonce complet") 
is in SYNTOL a kind of schematic representation model for docu- 
mentation ("representation documentaire") in which the "verb com- 
ponent" is used only to indicate the terminological relation that 
is to be stored. No doubt, this language employs a cognitive ap- 
proach, pointing towards the kind of conceptual representation 
that artificial intelligence and cognitive psychology are con- 
cerned with. SYNTOL represents an attempt to create a highly 
abstract and general system (it was born in the spirit of the 
universalists) , and the same theoretical idea seems to underlie 
Sager" s LSP system (Language String Parser) . Although focusing on 
the structure of specific subject fields, her work is based on a 
similar paradigm (see Sager, 1977). 

LSP represents an intermediate language which is highly struc- 
tured from both a linguistic and a subject theoretical point of 
view. It is an excellent example of the possibilities provided by 
computational linguistics to analyse scientific texts. LSP uses 
advanced linguistic techniques in combination with statistics. 
The analytical system has been set up empirically, with the aid 
of a linguistic structure. Implicit relations within the subject 
fields are translated into the linguistic format. 

Sager (1977, p 86) writes: 

"...scientific reporting is concerned with estab- 
lishing causal connections between events." 
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This approach takes its point of departure from cognitive psychol- 
ogy. Verbs have fundamental importance in the generation of 
"events", serving as a function (F) in a predicate-argument model, 

Sager"s response to Sparck Jones & Kay's (1973) request con- 
cerning automatic "deep structure" analysis by means of links and 
roles indicated exciting possibilities of development within the 
field of thesaurus construction. In a display (Sager, 1977, p 90) 
are demonstrated clusters of "events", expressed by clusters of 
verbs functioning as operators together with the arguments repre- 
sented in the form of the "roles" that chemical substances play 
in relation to each other. 

The generation of intermediate languages on the basis of natu- 
ral language texts "of a more restricted kind" holds out hopes 
of a promising future, according to Sager. Linguistic problems 
are easier to handle, especially since the vocabulary in these 
texts is used unambiguously. However, Sager does not state explic- 
itly that her model applies to all scientific thought and work. 
To be sure, she makes the following statement (1977, p 86): 

"Linguistically-based subfield formats are one 
answer to the question of underlying representation. 
While they are based on selectional constraints that 
operate in particular science outfields they have 
certain features which may be common to many science 
fields." 

However, she does not seem to realize that the possible impor- 
tance of her model for information science lies in the fact that 
it is aimed at the representation of "cognition", which is what 
information science should deal with. Without it "re-cognition" 
becomes unimportant. The extent to which language structures can 
also represent cognitive structures is an important research con- 
cern. In its attempts to approach that problem area, cognitive 
psychology in the latter half of the 1970~s has had a valuable 
impact on information science (see Damerau, 1976). 
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4. A METHOD OF GENERATION BASED ON FUNCTIONAL RELATIONS 

In the previous chapter it was suggested that an intermediate 
language which represents the cognitive structure in a message 
possesses a higher degree of precision and structuring than nat- 
ural language. Such structuring should, therefore, be the goal 
of every system whose task is to convey abstracted information. 
Not until terms and their syntactic relations are grounded in an 
explicitly specified cognitive model can they function as units 
of representation in an intermediate language. 

4.1 Star ting -points for the construction of a model 

Scientific concepts are communicated through scientific documents, 
whose various statements are based on empirical observations 
about events (Sager, 1977; B. Bierschenk, 1978b). The manifest 
representation of assertions about events (statements) may be de- 
scribed as a sentence, defined as Noun, -verb-Noun,,, where the 

a 2 

verb denotes the relation between the two nouns. A sentence will 
in the following discussion be referred to as N..VN-. These sym- 
bols can be used in a description of how conceptual relations are 
marked in natural language. N- and N ? represent labels (see Chap- 
ter 2.4) which are used to designate "sets of information", vary- 
ing in extent. Therefore, they may also be called "extensions" 
(see Lewis, 1972, p 174). The way in which an extension depends 
on another is generally denoted by functions. In this sense, the 
y symbol is a function. An extension may also be regarded as an 
argument which can take different values, expressed in the form 
of attributes. In this connection Lewis (1972, p 177) talks about 
"intensions". He writes: 

"Things are name extensions and values of name intensions; 
sets of things are common-noun extensions and values of 
common-noun intensions; sequences of things are assign- 
ment coordinates of indices. Change the underlying set 
of things and we change the set of extensions, indices 
and carnapian intensions." 

With Lewis's formulation as a starting-point, intension is de- 
fined as 

the properties connoted by a term 

and extension as 
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the class of objects designated by a specific term denota- 
tion. 

Results from research on memory structure (see Cofer, 1976} seem 
to indicate that there exists an abstract representation of text 
in memory. What is stored seems to be an internal representation 
of a proposition. For multi-argument sentences it is easy to 
supply the argument (s) missing. In developing a model for the rep- 
resentation of a proposition, the goal should be to be able to 
use the context for supplying the missing parts. Thus the N.vN 2 
paradigm should be supplemented with a default variable ($) , 
which is a place holder for missing arguments. For example, van 
Dijk (1977, p 133) says that 

"...propositions may be 'present' without being (fully) 
expressed in the surface structure of the discourse." 

This observation, formulated in connection with a "discourse mod- 
el" according to Krippendorff (1969), has been applied in the 
ANACONDA system concerning the coding of verbal answers obtained 
from interviews (I. Bierschenk, 1977). 

In principle, the N-vN, paradigm could be used as a model of 
representation even in applications of information science, 
since information may also be defined as being propositional 
(van Dijk, 1977, p 133). But the logic implied in the N.vN 2 
paradigm is not sufficient if the purpose is to study the inter- 
relations between different concepts in a proposition. The lin- 
guistic categories activated by this paradigm would be word 
classes (i.e. the "schema" is of the semantic-logical type) . For 
a description of how scientific information is communicated, 
however, a process-oriented model is required, i.e. a proposition 
model denoting intentions. By -intention is meant 

attention directed towards the goal of an action. 

Thus intentionality is a basic property of directed behaviour or 
an action. 

According to Werner & Kaplan (1963) it is the Agent-act ion- 
Object model that is used in the Indoeuropean languages to denote 
intentions. A proposition about an event, a state or conceptual 
relations generally consists of these components. A proposition 
may therefore be described as the AaO paradigm. 
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The AaO paradigm has been discussed and defined from a psycho- 
linguistio point of view in Bierschenk & Bierschenk (1976, Chap- 
ter 2). Its meaning is the following: 

Agent is defined as action centre or goal-seeking entity 
making use of various resources in order to achieve its 
goals. This description also includes, besides single indi- 
viduals, groups, organizations and abstractions. 

Action is defined as an act performed by an agent for the 
purpose of achieving a goal. The act defines the meaning of 
the AaO paradigm. 

Objective is defined as everything that an action can be 
directed towards or be performed with. 

The components represented by the AaO paradigm should not be con- 
fused with a case model of the Fillmore type. Fillmore's model 
is not basically different from other philosophical models, since 
it structures the world mainly in semantic-logical terms. 

On the other hand, the correspondence between the N..vN~ and AaO 
models is evident in the "function component". The a component 
is required to idendify the parts of the proposition. The action 
denotes which object (s) or goal(s) must be present in order for 
a proposition to be detected. But fragments of a proposition may 
also be present in connection with the AaO paradigm, i.e. single 
values may be missing and need to be supplemented. To accomplish 
this, the default variable is used. 

Experimental results support the hypothesis concerning the fun- 
damental importance of the AaO paradigm as a format for represent- 
ing propositions (Werner & Kaplan, 1963, p 58). Kintsch's (1974) 
subjects were asked to sort sentences into categories. The sort- 
ing criterion was the relation between nouns in a sentence. The 
study showed that it was easiest to remember a set of nouns act- 
ing as agents. The most difficult to remember was the object role. 
Furthermore, Kintsch's studies seem to indicate that decomposi- 
tion of complex concepts into more elementary ones did not facil- 
itate remembering or understanding. Some kind of "lexical decom- 
position" (p 249) as a psychological process for retrieval is not 
supported by his data. 

When it has been determined what constitutes a proposition in 
the cognitive sense, its different manifestations in language can 
be examined on the basis of natural data, e.g. composition of 
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words, variations in sentences due to the number of arguments, 
redundancy, etc. 

As has been stressed in the previous chapters, a document can 
be described in many ways. But each description that goes beyond 
purely formal bibliographic information requires the choice of a 
model and thus the adoption of specific assumptions about the 
content of the document. A model intended to represent a proposi- 
tion about the scientific work communicated in a research report 
cannot, contrary to the view held by Sager (1977, p 86), be "lin- 
guistically-based", but should be related as closly as possible 
to the theoretical foundations regarded as adequate to the re- 
search process that the content of the document is supposed to 
represent. The paradigm or schema chosen as the format of repre- 
sentation (see Chapter 2.5) is thus determined from the basic 
components of the research process itself. Otherwise, the model 
cannot adequately represent the statement that the author makes 
about the research process by means of the condensed proposition 
in the document title. Nor can the different values assumed by 
the components of the model be interpreted. Sager's (1977, p 86) 
statement that scientific reporting deals with the establishment 
of causal connections between events could be transferred to a 
higher level of abstraction, so that instead of concerning the 
structure of a subject field, it would concern the structure of 
research itself. Relations of a higher order would be established, 
making it possible to derive structural and functional aspects of 
several subject fields and to establish connections between them. 

The single events about which the researcher communicates in- 
formation have appeared in a certain contextual frame, which in 
turn is reported (represented) in a frame of higher order. There- 
fore, the title may be regarded as the proposition that repre- 
sents all the others in a particular information set. Similar 
points are made by van Dijk (1977). 

It is generally accepted that "problem", "method" and "goal" 
are the fundamental components in the research process (see 
Bunge, 1967, p 6). The "method" component explicitly denotes the 
way in which the research for new information is to be done. The 
"Problera-method-Goal" model denotes the aim (direction) in the 
research process, namely a conscious steering towards or a sys- 
tematic and goal-oriented search for new information. An abstvaat- 
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ed or "schematic" proposition will in the following discussion 
be referred to as the PmG paradigm. The single components of the 
paradigm have been presented in B. Bierschenk (1974) from the 
point of view of research theory. Their meanings are the follow- 
ing: 

Problem is in this context defined as something that is re- 
flected against a scientific background and that is to be 
solved by scientific means for the purpose of creating new 
information. In this respect the problem component has a 
governing function in connection with scientific activity. 
There is (implicit) intentionality involved, which makes it 
possible to compare the role of this component with that of 
the agent in the AaO paradigm. 

Method is defined as ail scientific activity performed for 
the purpose of showing that a problem can be solved com- 
pletely, partly, or not at all. Rules concerning this acti- 
vity are specified, aiming at minimizing different kinds of 
error sources. The rules concern, in principle, the re- 
searcher's (1) way of approaching the problem, (2) planning, 
and (3) instrumentation. If these are fixed, the individual 
researcher will act stereotypically or in a scientifically 
sterile way. Therefore, it is of considerable importance 
that problems are formalized in such a way that the formal- 
ization supports the use of adequate methods. This should 
be done in the form of hypotheses which can be tested agains 
different kinds of criteria. 

Goal is defined as an explicit formulation of the governing 
idea included in the problem component. It concerns repre- 
sentations of goals, levels of achievement, and anticipated 
solutions. 

The components in this abstract proposition model should, like 
abstractions in general, be regarded as aggregations of concepts 
and conceptualizations. The values (i.e. the types of terms that 
an argument can take) that are assigned to each component may in 
themselves be of different kinds, and can be further categorized 
and analysed. It may be appropriate to mention that the three 
components are not comparable with such linguistic categories as, 
e.g., word classes, sentence constituents, or cases. The appear- 
ance of a "verb" is readily expected under the Method component. 
But in fact a research technique (which is one of several reali- 
zations of the method) may be called "intervju" (an interview) , 
and a problem (what a researcher tries to solve) may be referred 
to as "att intervjua" (to interview) . 

Analysing a title assigned to a scientific text from the point 
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of view of a linguistic representation of conceptual relations 
(here scientific concepts) implies a study of the result of an 
abstract representation of the text as manifested in a title. 
Such a study is based on cues provided by the manifest struc- 
ture of the title. Since the overt organization of the title may 
be fragmentary and restricted with respect to the PmG paradigm, 
a default variable is required even in this case, serving as a 
place holder for missing arguments. 

In order to analyse the relations between the single components 
in the title, a way of indicating the components' roles and struc- 
tural connections is needed, referring to the theoretical start- 
ing-points of the model. 

4.2 Presentation of the model 

Human cognition is supposed to be based on spatially organized 
representations of phenomena, as pointed out by, among others, 
Piaget & Inhelder (1956) and Miller & Johnson-Laird (1976). 
Basic experiments have been performed by Piaget & Inhelder, indi- 
cating that the first spatial understanding in children is topol- 
ogical. Children obviously understand proximal properties like 
order, demarcation and continuity. Not until later developmental 
stages have been reached do properties such as angles, parallelism 
and distance become comprehensible. Piaget's (1963) opinion is 
that adults build up implicit cognitive representations (schemata) 
consisting of coordinates. These schemata are employed for orien- 
tation in space and time. 

Starting on the assumption that a title is an abstracted propo- 
sition about a research process, it can be stated that the single 
components in the process must be distinguishable for the title 
to be properly understood. Therefore, it seems natural that in 
language, too, there should exist a construction which makes it 
possible to demarcate single components. This structuring role 
appears to be assigned to prepositions. Prepositions have always 
played an important organizing role in linguistic and computa- 
tional linguistic analysis. They have functioned as separators in 
automatic text processing, having no other role than to dissect 
segments (Hiilman & Kasarda, 1969; I. Bierschenk, 1978) or to 
extract index phrases (Salton, 1962; Braun & Schwind, 1975). They 
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have functioned as markers of roles or cases in syntactic and 
semantic analyses (Friedman, 1973; I. Bierschenk, 1977; Cedvall, 
1977) . Further, they have functioned as statistic variables in 
automatic estimation of distances between significant terms in a 
document vocabulary (O'Connor, 1973). But as far as I know, they 
have not been used as functions in an analysis based on an ex- 
plicitly described conceptual model in information science. 
A typical title of a scientific text can be: 

En analys av titlar 
(An analysis of titles) 

The "scientific event" underlying this title may be described in 
terms of statements like 



Jag analyserar titlar 
Jag har analyserat titlar 



(I analyse titles) 

(I have analysed titles) 



The "event" condensed here derives both the agent and the action 
from En analys (An analysis) . The transformational level is 
marked by the preposition av (of) . Before the transformation the 
verb form indicated the concept titlar (titles) as being an "ob- 
ject". This role is still to be discerned after the transforma- 
tion through the function of the preposition av (of) . 

The goal of scientific inquiry, however, is not to handle per- 
sons or solid objects, but to deal with problems. However, prob- 
lems also imply that there are possible solutions, which means 
that problems determine the research process in the same way that 
the object in a sentence determines what type of verb may be se- 
lected. Therefore, it might be justified to have the label "ob- 
ject" replaced by the label "problem". As previously mentioned, 
Problems include intentions; consequently, the role of the compo- 
nent in the PmG paradigm is comparable with the Agent in the AaO 
paradigm. The preposition av (of) then functions as an operator 
for the Problem component. When the problem has been identified, 
En analys (An analysis) remains to be analysed. This part of the 
title can now be given an unambiguous interpretation, i.e. it 
denotes the scientific event, as manifested in the methods or 
means used. These two components can be fitted into the PmG para- 
digm as follows: 
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Kategori 
(Category) 

Operator 
(Operator) 



PROBLEM 
(PROBLEM) 

av 

(of) 



METOD 
(METHOD) 



MAL 
(GOAL) 



Representation titlar 
(Representation) (titles) 



En analys 
(An analysis) 



* 



Scientific activity cannot be equated with a determinable object 
or determinable problems, but should be defined as a strategy, 
i.e. a way of tackling problems. The development of new methods 
and instruments increases the individual researcher's possibili- 
ties of creating new information. For this reason, the informa- 
tive value of the title increases if it contains information on 
the research strategy or technique used. If the information is 
expanded, so that the analytical technique is made explicit, the 
title can be given as 



En analys av titlar med en kodningsalgoritm 
(An analysis of titles with a coding algorithm) 



(2) 



As shown sofar, the research strategy is En analys (An analysis) 
The preposition med (with), however, specifies in more detail 
what plans, techniques, or instruments have been employed. The 
new information can be fitted into the PmG paradigm as follows: 



Kategori 
(Category) 


PROBLEM 
(PROBLEM) 


METOD 
(METHOD) 


INSTRUMENT 
(INSTRUMENT) 


MAL 
(GOAL) 


Operator 
(Operator) 


av 
(of) 


\ 


med 
(with) 




Representation 
(Representation) 


titlar 
(titles) 


En analys 
(An analysis) 


en kodnings- 
algoritm 
(a coding al- 
gorithm) 


$ 



Means or instruments play a central role in science. It is there- 
fore reasonable to assume that med en kodningsalgovitm (with a 
coding algorithm) is an explicit expression of what researcher X 
does, i.e. his way of analysing titlar (titles). Therefore, all 
concepts denoting means may be arranged under the Instrument com- 
ponent. In research these are seldom solid objects (tools). How- 
ever, they do have a more concrete function in connection with 
the method. This is the reason, in the example under discussion, 
why the Instrument component is made explicit and placed between 
Method and Goal. (The goal determines the instrumentation of a 
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method . ) 

If the title is further expanded, so that it also contains an 
explicit statement of the goal, it may appear in the following 
form: 



En analys av titlar med en kodningsalgorithm for 
begreppsigenkanning 

(An analysis of titles with a coding algorithm for 
concept recognition) 



(3) 



The goal is, in this case, to recognize concepts. As example (3) 
shows, the preposition fdv (for) denotes this intention, i.e. it 
gives the reason why a certain act that requires certain instru- 
ments has been performed. The information about the goal can be 
fitted into the PmG paradigm as follows: 



Kategori 
(Category) 


PROBLEM METOD 
(PROBLEM) (METHOD) 


INSTRUMENT 
(INSTRUMENT) 


MAL 
(GOAL) 


Operator 
(Operator) 


av 
(of) 


med 

(with) 


f6r 

(for) 


Representation 


titlar En analys 


en kodnings- 


begre 



algoritm igen- 

kanning 
(Representation) (titles) (An analysis) (a coding al- (concept 

gorithm) recog- 
nition) 

What has been described so far are the main components of the 
PmG paradigm, illustrated by a particular research strategy or 
schema. The representation of this strategy is the result of a 
course of events, which should be seen as movement in space and 
time. But since every kind of movement requires a description in 
space and time, this would be trivial information in a title. 
Therefore, in general, the concrete place and time of a particu- 
lar research activity are not specified (and hardly ever is in- 
formation given concerning the place and time of the writing of 
the report itself) . If, nevertheless, place and time are indi- 
cated, the most appropriate thing, from a linguistic point of 
view, would be to let space and time become determiners to the 
sentence itself, i.e. to the verb component. But in connection 
with the transformation of a "concrete" natural language expres- 
sion of a course of events, e.g. "Jag har analyserat titlar i 
flera manader" (I have analysed titles for several months) , into 
the abstracted intermediate form "En analys av titlar" (An anal- 
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ysis of titles) , the temporal aspect of the verb form and the de- 
notation of time by the prepositional phrase are nullified. 
Transferred to the PmG paradigm, space and time would determine 
the method. However, since the method itself determines the re- 
sults of the research, space and time are irrelevant concepts. 

Solid objects are considered with respect to their relative lo- 
cation (see Ralph, 1977). Depending on the dimension in which the 
object is demarcated, this may be expressed with prepositions, 
for example i (in) for space and pa (on) for surfaces. As was 
initially mentioned in this section, research seems to indicate 
that human cognition is based on spatially organized representa- 
tions, developing from a topological to a multidimensional stage, 
represented as systems of coordinates. The child learns to see 
relations between solid objects and to express these relations 
by means of prepositions. The ability to form concepts (abstrac- 
tions) comes later, although the prepositions used to relate 
abstractions are the same. Language conventions are often the 
reason why a preposition with a plainly two-dimensional function, 
e.g. pa (on), is used to denote a more abstract relationship. 
Prepositions denoting space are also used to specify time, since 
time, too, has direction and extent. In different kinds of auto- 
matic analysis of natural language this ambiguous use of preposi- 
tions is a great disadvantage for the determination of the mean- 
ing of the component that follows. But since scientific titles 
convey abstract relations on an intermediate level, the ambiguity 
that is necessary in more concrete contexts is eliminated in the 
same way as certain aspects of verbs are no longer relevant after 
a certain abstraction has been performed. 

Research focuses on problems which are multi-faceted. This im- 
plies that a title always gives expression to multidimensional 
phenomena. But problems, too, may be determined as being part of 
a problem area which incorporates a time dimension. (For a dis- 
cussion, see Miller & Johnson-Laird, 1976.) This is expressed in 
the title: 



En analys av begrepp i titlar fran fyra decennier 
(An analysis of concepts in titles from four decades) 



(4 



The preposition i (in) determines where the begrepp (concepts) 
are to be found, i.e. in titles and in no other type of text. The 
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preposition i (in) here denotes a demarcation, not an inclusion 
(cf. Ralph, 1977, p 5), since this example concerns the demarca- 
tion of a problem, not the localization of a concrete place. The 
preposition fvan (from) is used in contexts of space, denoting a 
starting-point. It has the same meaning when used on the time 
dimension. These relations may be incorporated into the PmG para- 
digm as follows: 



Kategori PROBLEM 
(Category) (PROBLEM) 



METOD INSTRUMENT MAL 
(METHOD) (INSTRUMENT) (GOAL) 



Operator av i fran 
(Operator) (of) (in) (from) 



Represen- begrepp 
tation 

(Represen- (concepts) 
tation) 

titlar 
(titles) 



En analys 
(An analysis) 



fyra de- 
cennier 
(four de- 
cades) 

It should also be mentioned that geographic places, too, are re- 
garded as abstract concepts in this model. Depending on where in 
the schema such a name is inserted, it demarcates the component 
under consideration. This is discussed in greater detail in the 
next section. 

As can be seen from the above examples, a strict sequential 
order is described. The last demarcation specifies begrepp (con- 
cepts) in such a way that it concerns not only titles but also a 
particular period of time. There seems to be a need for a time 
dimension in order to discriminate between experiences from dif- 
ferent periods of time. 

According to Oiler & Sales (1969) the principle of concentric 
order seems to be of general relevance in the analysis of sen- 
tences, in the sense that the most specific information is located 
farthest away from the sentence kernel. This principle will here 
be employed in the sense that the important organizational func- 
tion of the prepositions will form the basis for automatic demar- 
cation and determination of format. The operationalization will 
be demonstrated in the following section. 
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4.3 Opevationalization of the model 







The creation of order presupposes a schema within which the order 
is to be set up. Thus the task of creating order within titles 
first of all implies the determination of what constitutes a ti- 
tle. The function of a title has been discussed previously and it 
may seem superfluous to go further into the concept of title at 
this stage. But in the coding and editing of thousands of titles 
written in several languages, it soon becomes evident that titles 
may be structured in many ways, perhaps as a consequence of their 
function. One can distinguish the length of the title, the form 
and colour of the letters, the title's localization on the cover 
of the document, typological variations concerning capitals and 
small letters in main titles and subtitles, different kinds of 
punctuation marks to organize the various entities in the title, 
and so on. But, in writing instructions for automatic organiza- 
tion of the entities all dimensions in the title are not consid- 
ered, because many of them have no significance in the represen- 
tation of document content. However, when large bodies of data 
are to be handled, a unified format must be determined which in- 
dicates the limits within which interpretation and inference are 
allowed. 

The abstractions and the relations between them conveyed by a 
scientific title are the visual result of the author's conceptua- 
lization at a certain point in time. The processes that preceded 
this conceptualization, i.e. the formation of the scientific con- 
cepts, are no longer distinguishable, at least not in one and the 
same title. Thus a reconstruction of the processes involved is 
hardly possible once the conceptualization has been completed 
(cf. Kintsch, 1974)., 

A title represents one or more conceptualizations, which in 
their manifest form can consist of more or less complex language 
structures. A title consisting of one word consequently expresses 
a conceptualization whose structural relations are implicit, 
while a highly structured title explicitly indicates such rela- 
tions, e.g. through prepositions. What is important for the de- 
velopment of a system of rules is to mark a detectable boundary 
for a conceptualization. In order to avoid difficulties of inter- 
pretation it will be necessary to utilize purely orthographic 




marks. In the instructions concerning the punching of the mate- 
rial it was said that a subtitle should be separated from the 
main title by a full stop (which is usually avoided in titles due 
to consederations of layout). Semicolon, colon and dash also serve 
as demarcators. Titles such as the following express two concep- 
tualizations: 

Fragor kring studiedagar - en enkatundersSkning (5) 

(Questions about study days - a questionnaire investigation) 



De akademiska undervisningsformerna 
Universitetspedagogik 
(The academic teaching forms) 
(University pedagogy) 



(6) 



The titles are authentic. In order to maintain the structure of 
the Swedish wording, it has been necessary to give a word-by-word 
translation. This applies also to the examples given in Chapter 
6. 

Another (albeit rarely seen) type of boundary marker is repre- 
sented by conjunctions coordinating two conceptualizations, i.e. 
they function as connectors. 

En empirisk studie av kognitiv utveckling samt (7) 

(An empirical study of cognitive development and [also] 
en kritisk analys av intelligenbegreppet 
(a critical analysis of the intelligence concept) 

This boundary marker merely separates two conceptualizations from 
one another rather than connecting them. The difference between 
these and the ones that are demarcated by a punctuation mark is 
that they are explicitly marked as coordinated. But the boundary 
marker samt (and [also]) signals that a new conceptualization has 
to be coded; consequently, samt (and [also]) is regarded as a dis- 
connector. 

Conjunctions functioning as "real" conjunctions coordinate con- 
cepts within the same conceptualization. Commas, added by the 
punching, also function as connectors: 

Inlarningsmaskiner och programmerade hjalpmedel (8) 

(Learning machines and aids for programmed instruction) 

Familj, skola, samhalle (9) 

(Family, school, society) 

M&tning av sprakf ardighet i engelska och tyska (10) 

(The measurement of language proficiency in English 
and German) 
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These instructions and decisions belong to the kind of order- 
creating rules that may be called demarcation rules. Another form 
of demarcation which serves an editing function is the use of 
brackets. In order not to disconnect such an entity from the one 
it belongs to (in the function of explaining, etc.), it became 
necessary to disregard certain prepositions within those entities. 
By this arrangement they are automatically assigned to the 
nearest preceding entity. 

The demarcation rules have provided the outer framework of the 
model. Two types of rules then form the basis for the conceptual 
coding, namely stop rules and structuring rules. 

As indicated in the previous section, the analytical method 
employed rests on certain basic assumptions concerning the func- 
tion of the prepositions in a title, which is to indicate, or 
point forward towards, certain types of concepts. In this func- 
tion their position relative to one another is of considerable 
importance. From the model it is obvious that the prepositions 
("pointers") belonging to the main components, are essentially 
av (of) , med (with) , and fdr (for) . These prepositions propel "the 
action" forward, thus expressing the transitivity in the paradigm 
("the horizontal level"). Prepositions demarcating the main com- 
ponents are located between them and in a specified order. As a 
consequence, the concepts they point forward towards are ordered 
"vertically" under the nearest preceding main preposition. This 
state of affairs can be compared with, e.g., Abelson's (1973) and 
Faughts"*s (1977) models of "belief systems". Faught (1977, p 5) 
writes: 

"Human use intensional constructs such as beliefs and 
intentions to order the environment and direct their 
behavior . " 



Intentions are realized through actions and are constructed through 
conceptualizations. In the PmG paradigm, this corresponds to the 
components being distinguished by means of av (of) , med (with) , 
and for (for) , on the one hand, and being demarcated and defined 
by means of prepositions such as i (in) , pa (on) and frdn (from) , 
on the other. 

Within the theoretical context presented in the previous sec- 
tion, the former type of prepositions will be called intentional. 
On the assumption that a concept is perceived as an extension 
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or denotation with respect to the properties and characteristics 
that form the intensions or connections of the concept, an ex- 
plicit specification of the concept's intension is regarded as an 
extension. This extension should be considered to be spatial. 
The extension functions as a demarcation, i.e. as a visual sig- 
nal. Therefore, the latter type of prepositions will be called 
extensional. 

This is illustrated in the figure below: 
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Figure 4. Sketch of the organizing principles in titles 

In Chapter 4.2 the use of the so-called locative prepositions in 
the localization of abstractions was mentioned. With respect to 
the original meaning of these prepositions it was pointed out 
that their ambiguity when used with objects of a concrete kind 
is nullified when the objects themselves are of an abstract kind. 
The usefulness of the assumption of prepositions as functions, 
which is interesting from the point of view of computational lin- 
guistics, has a psychological relevance in this model. A function 
does not have any meaning of its own. As soon as it has performed 
its task of relating two concepts (comparable to placing some- 
thing in a system of coordinates, see Chapter 4.2), it loses its 
importance. Only the functional relation between the concepts 
remains. 

The natural-language conception of prepositions has been proved 
inadequate to the analysis of conceptual relations. For example, 
Brodda (1973) points out that what he calls "inner" (logical, 
cognitive) cases are not represented by prepositional phrases but 
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to a considerable extent by subordinate clauses in Swedish. A 
case model for the representation of "inner" cases, therefore, 
cannot be based on a prepositional framework. This is an impor- 
tant observation applying to sentences in natural language, for 
which linguistic models are generally developed. The problematic 
prepositions correspond to those that in the present context are 
called intentional. Therefore, it may be of interest to take a 
closer look at the distribution and function of these preposi- 
tions in a large corpus of natural language. 

In Nusvensk Frekvensordbok (Frequency Dictionary of Present Day 
Swedish) , part 4 (NFO 4} , covering the language of the daily 
press (see Align, et al., 1980), the following principal facts 
can be extracted with respect to the intentional prepositions 
(as they are called here) . Phrases beginning with av (of) are 
mainly said to indicate "origin in time and space", "source of 
event" and "focused object". Phrases beginning with med (with) 
are chiefly explained as having "associative function". In the 
function "taking place through" (instrumental) they seem to have 
a relatively low frequency. The main functions of fdr (for) are 
given as "with indicated purpose" and "with influence on someone". 

These examples provide a good illustration for the discussion 
in Chapter 3 concerning differences between natural and artificial 
language. The variability in natural-language expressions allows 
for variations in interpretation, which can be utilized for more 
or less finely graded analyses. In this case, determining the 
"meaning" of a preposition depends on its context. The variation 
that can be shown with such an analysis, however, concerns the 
domain of usage. The more artificial the text under analysis is, 
the greater the precision required in the definitions of the 
prepositions. In this perspective such variations in "meaning" as 
those presented in NFO 4 can be seen as paraphrases, which are 
stabilized at the intermediate level, i.e. a baa-Co function 
emerges. Thus the reduced syntactic structure in titles must not 
give rise to differences in interpretation. 

It follows that an interesting objective of research would be 
the extent to which the same basic function is assumed and used 
in different texts. The preposition av (of), according to NFO 4, 
may be compared with the function av (of) in scientific titles. 
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The "natural" texts focus on a kind of agentive meaning which, as 
already mentioned, does not obtain in titles. Instead, an Object 
function emerges, called "problem". The function fbv (for) evi- 
dently denotes intention, i.e. a representation of a goal, and 
also a Result or a Recipient aspect, both of which may be includ- 
ed in Goal. The meaning "with influence on someone" might concern 
persons as the goal of an action. The preposition med (with) , 
which in NFO 4 primarily denotes an associative function, has in 
the PmG paradigm been defined as denoting an instrument. Differ- 
ences may be caused by different interpretations of the term 
"Instrument", depending on the model within which the term be- 
longs. 

Welin (1974) discusses the ambiguity of prepositional phrases 
in titles in an attempt to analyse the possibility of correctly 
determining their structural relations from the point of view of 
information and documentation. But Wei in" s material is not homog- 
eneous, which means that ambiguity is likely to be difficult to 
define (ambiguity in relation to what?) . Furthermore, different 
interpretations of phrases are discussed on the basis of linguis- 
tic assumptions that may not be relevant. The doubt expressed by 
Welin regarding the feasability of automatic analysis of preposi- 
tional phrases presumably derives from the fact that he discusses 
the matter without an explicitly formulated information-oriented 
model . 

In the present analysis the "schematic" organization in titles 
for automatic coding is emphasized (see Fig. 4) . The sequential 
order among the prepositions is employed in this theoretical con- 
text by way of numerical codes expressing the relations between 
the "axes" in a coordinate system as follows: 



En analys av begrepp i titlar 
(An analysis of concepts in titles) 



(11) 



40 



30 



33 



The concepts expressed by the intentional prepositions are assign- 
ed numerical code numbers ending with "0". The associated exten- 
sional concept is assigned another number, where the sequential 
order begins with a number other than zero. The following concept 
would have the numerical code number 34, and so on. The Method 
component is assigned code number 40. This system is set up in 
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such a way that the sequential order is algorithmically fixed. 
For practical reasons, the coding system has been taken over from 
the ANACONDA system (Bierschenk & Bierschenk, 1976, p 40). It 
should be kept in mind that the ANACONDA model is applied to nat- 
ural language (interviews) , thus including more components than 
does the PmG model. Certain codes, therefore, are left empty in 
PmG ,. others processing a generalized meaning, because of the 
higher level of abstraction in the PmG model . 

The prepositions may in these titles have varying contexts, 
which must be precisely definable, i.e. the "length" of the prep- 
ositions must be specified. For there are a number of multi-word 
expressions that may be regarded as prepositions (cf , Welin, 
1974, p 138). Such a compound preposition may consist of several 
strings of characters (see Box 4 below) . Prepositions may also be 
part of fixed phrases, where they do not have any pointer func- 
tion. Such strings have been listed in a dictionary. (A multi- 
word phrase may here be preceded by an operating preposition, 
e.g. inom vamen f&v (within the frame of), where vamen fdr (the 
frame of) is specified as a multi-word phrase; in this role it 
does not allow fQv (of) to operate. The operating preposition is 
inom (within) ) . 

The number and the type of fixed combinations probably differ 
somewhat within different subject fields. The dictionary speci- 
fied here is based on characteristics of the authentic material 
(for further description, see Chapter 5) . One characteristic fea- 
ture in many variants of natural language is phrases of several 
kinds (cf. All§n, 1976), which could be an important explanation 
for the polysemic features of prepositions (such as they have 
been analysed in NFO 4). The titles examined in the present anal- 
ysis have been collected from works written by a randomly selected 
sample of researchers from a population in which a specific defi- 
nition of "researcher" has determined what titles were to be in- 
cluded in the analysis. A scientific title, then, represents work 
carried out by a researcher. Since researchers hold different 
posts and have specialized in different fields, differences with 
respect to a certain researcher and among researchers may be 
reflected in the titles of their different works. This may give 
rise to the utilization of phrases and combinations typical of 
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more concrete expressions than would be expected in titles. 

Rules for treating such different cases of combinations involv- 
ing prepositions are here called stop rules. Prepositions may be 
combined with other prepositions, either directly (as in "Tysk 
skola av i dag" (The German school of today), "Varfor lararna 
inte kan vara med i Hem-och-skola" (Why teachers cannot join the 
Home-School Society)), or linked by means of a conjunction {"F&r 
och mot den nya skolan" (For and against the new school [system])). 
Preposition and conjunction may also form a pair ("Bakgrund till 
oah tolkning av..." (Background to and interpretation of...), or 
"...pa lagstadiet och pd fritidshem" (...in primary school and in 
centres for children's leisure activities)). Specification of such 
cases prevents coding errors. Otherwise the rules for concept cod- 
ing will interfere with each other when concepts are linked by 
way of a connector (the conjunctions oah (and) , eller (or) or 
comma) . In the same way as an interconnection takes place in or- 
dinary sentence analysis, the concepts on both sides of a con- 
nector should be assigned the same numerical code number. 

Before the structuring rules are presented, the rules that the 
dictionary operates with are given in Box 4. In the English 
translation of the Swedish prepositions only one alternative is 
given, considered the most common in the contexts in question. 

It is assumed that the pointing and ordering functions of the 
prepositions are dependent on the order among them. The pointing 
forwards, as implied by the intention, consequently means that the 
method governing the "movement" is placed at the very beginning, 
"pushing" the other parts in front of it. The first focus of the 
method is the problem, if av (of) is the first preposition, and 
so on. Consequently, when all intentional components have been 
defined, the remainder is always coded as being the method. This 
basic principle is demonstrated in the following coded examples: 



Psykologiska analyser av militara befattningar 
(Psychological analyses of military posts) 
40 30 



(12) 



Matningar med projektiva test 
(Measurements with projective tests) 
40 80 



(13) 



Lash j alp for synsvaga 

(Reading aid for the visually handicapped) 

40 70 



(14) 
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Box 4, 



Dictionary in automatic organization of concepts in titles 



* 

Prepositions are: 






— -■ - — ■■■ ■■— — ■ — 

* 
Multi-word prepositions are: Function 


i (in) som 


(as) 


mot 


(towards) 


i anslutning till 


(in connection with) = i 


av (of) mellan 
for (for) fran 


(among) 
(from) 


enligt 
genom 


(according to 
(through) 


i samband med 


(in connection with) = i 


pa (at) rorande 


(concerning) 


infor 


(at) 


i relation till 


(in relation to) = i 


till (to) under 
med (with) hos 


(under) 
(in) 


at 
ur 


(to) 
(from) 


i fraga om 


(concerning) = i 


om (on) kring 


(around) 


over 


(on) 


med hjalp av 


(by means of) =med 


inom (within) bland 
vid (by) efter 


(among) 
(after) 


angaende (concerning) 


med speciellt 
avseende pa 


(with special = i 
reference to) 










med sarskild 


(with special = i 










hansyn till 


reference to) 










med sarskild 


(with special = i 


Multi-word phrases 


are : 






anknytning till 


relation to) 


typer av 


(types of) 






med sarskild 


(with special = i 


slag av 


(kind of) 






inriktning pa 


focus on) 


grad av 
former av 


(degree of) 
(forms of) 






med tonvikt lagd vid (with emphasis on) = i 


raraen for 


(the frame of) 




med tonvikt lagd pa (with emphasis on) = i 


samvariation med 
samverkan med 


(intercorrelation with) 
(cooperation with) 


kombinerad med 


(combined with) = i 


ett perspektiv av 


(a perspective of) 




jamfSrd med 


(compared with) = i 


synpunkter pa 


(views on) 










exempel pa 


(examples of) 




Conjunctions are: 




redogorelse fSr 


(account of) 






och (and) 
eller (or) 




aspekter pa 


(aspects of! 
















samt (and [also]) 










resp (respectively) 










Main prepositions 


are : 










av (of) om 


(on) = av 










med (with) angaende (concerning) « av 










f6r (for) rorande (concerning) = av 










over 


(on) = av 



in 



Operational definition 






The concepts that in these titles have been arranged under the 
Method component are examples of activities that have been star- 
ted in order to solve a problem. Strategies, procedures, single 
events, etc., have been consolidated. 

Many titles do not have a Method component, namely those with- 
out a preposition. The same goes for titles with an initial prep- 
osition: 

Differentieringsfragan (15) 

(The dif ferentiality problem) 

30 

Om kunskap (16) 

(On knowledge) 

30 

When the title does not express an intention, i.e. when the struc- 
tural relations are implicit, what is referred to is only that a 
certain problem area is dealt with in some way. The preposition 
om (on) is not governed by a "pushing" method and has no function 
in the model. For similar reasons, a terminal preposition (al- 
though very rare in scientific titles) has no function from this 
point of view: 



Vad funderar barn pa? 
(What think children about) 



(17) 



This title just expresses that a problem is dealt with without any 
scientific specification of how or why. 

Since the system is based on distinguishing main concepts from 
subconcepts, the rules must also be constructed in such a way that 
titles of considerable length will be assigned a correct coding. 
A title of a certain length may include several instances of one 
and the same preposition. Regardless of what status the preposi- 
tions may have in the model, only one main component of the same 
kind can be activated. This can be illustrated by means of the 
following example: 

Utvardering av f5rsok med en variant av (18) 

(Evaluation of experiments with a variant of 
40 30 80 83 

arskurslos undervisning 
nongraded teaching) 

The last instance of av (of) determines only the variables of the 
instrument. 
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Below are presented, in a variant of natural language, the al- 
gorithm for automatic coding of the concepts in the titles. The 
program is written in ASCII FORTRAN. 

Rule 1. Control multi-word prepositions 

Rule 2. Control multi-word phrases 

Rule 3. Preposition and conjunction within ( ) do not 
operate 

Rule 4. A preposition as first word does not operate 





Rule 


5. 


If a conjunction 
conjunction and 
operate 


connects two prepositions = the 
the second preposition do not 






Rule 


6. 


If a preposition 
the preposition 


is followed by a conjunction = 
does not operate 






Rule 


7. 


If a conjunction 
the preposition 


is followed by a preposition = 
does not operate 






Rule 


8, 


If a preposition 
the first prepos 


is followed by a preposition = 
ition does not operate 






Rule 


9. 


If med (with) or 
ond instance is 


f&v (for) are repeated, the sec- 
subordinated 






Rule 


10, 


Only one of av (of) , om (on) , rdrande (concerning), 
angaende (concerning) , or dver (on) can be the 
main preposition in a clause. The first instance 
becomes the main preposition. A "clause" is demar- 
cated by a conjunction or a comma 






Rule 


11. 


A preposition as 


terminal word does not operate 






Rule 


12. 


Main 


rule for in 


itial prepositions 










40 


av 

om 

rorande 

angaende 

6ver 


(of) 

(om) 

(concerning) 

(concerning) 

(on) 


30 










40 


for 


(for) 


70 










40 


med 


(with) 


80 










30 


not main 
preposition 


33 










Only the first preposition ir 
wards 


a clause points for- 
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Rule Id. Main rule for non-initial prepositions 

av (of) 30 not main 33 not main 34 

preposition preposition 



for (for) 70 not main 73 not main 74 

preposition preposition 

med (with) 80 not main 83 not main 84 

preposition preposition 

not main 33 av (of) 34 
preposition 






Rule 14. A conjunction connects expressions of the same 
type 

It should be emphasized here that these rules have been developed 
for the testing of the model. A more detailed description of them 
can be found in a seperate publication (Bierschenk, Bierschenk & 
Sternerup-Hansson, 1979) . In the account of the result of the 
empirical test that will follow in Chapter 6, the way in which 
the algorithmic analysis has been performed will be made clear. 
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5. STRUCTURES IN A REPRESENTATIVE DATA BASE 






The testing of an analytical method of the kind presented here 
cannot be performed without a data base. Depending on the purpose 
of the analysis, it can be organized in different ways and creat- 
ed by means of different techniques. The data base to which the 
method under discussion has been applied differs in some respects 
from the bases on which analyses in information science are usu- 
ally based. 

The data base is experimental. It contains (1) all bibliographic 
data concerning scientific documents produced by a sample of re- 
searchers during a period of 40 years, (2) all references cited 
by these researchers in the documents from this period, and (3) 
a linkage between the references given in the documents and an 
extensive interview material (4000 typed pages) concerning the 
researchers' grant-supported activities. A more detailed account 
of the data base is given below. 



5.1 Organization of the data base 

The goal of developing a method for the generation of an inter- 
mediate language, with the capacity to convey information within 
a certain field af application (here research within education) , 
requires a decision regarding who or what should represent the 
field. Based on results presented in B. Bierschenk (1974), the 
definition of the researcher population employed here includes 
psychologists, educationists and sociologists. After "researcher" 
had been defined, a random sample was drawn from the resulting 
population. The knowledge represented by these researchers (B. 
Bierschenk, 1979) can be retrieved in non-fugitive form from their 
written works (cf. Chapter 1), which have all been collected, 
starting from their first scientific product (Ph. L. or Ph. D. 
thesis) . Thus this sample of works may be regarded as representa- 
tive of the focus of attention in research of relevance to 
Swedish education. 

As a rule, each work includes a so-called reference list of the 
various sources and publications drawn on. The references listed 
may give a certain idea about the kind of research information 
that has been of relevance. It is possible to study and control 
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the central vocabulary in the references given. This constitutes 
a kind of content analysis method used in document description 
(see Taulbee, 1968). 

An experimental data base has been set up, containing biblio- 
graphic descriptions of the works and their respective references, 
connected through identification codes. The description are built 
up according to one of the international standards (American 
Psychological Association, 1965). To allow the processing of 
single pieces of information, they have been divided into several 
fields. The different types of information are presented in Box 5. 



Box 5. Representation of bibliographic information for 
computational processing 



Field 


Information 


1 


Name of author with initials for first name and 




information about author function, e.g. Ed. 


2 


Title and subtitle of a work 


3 


Place of publishing 


4 


Publishing company 


5 


Year, volume, number, page 


6 


Name of journal, series, mimeograph 


7 


Other characteristics of works 



Each bibliographic reference can be unambiguously identified. The 
identification code specifies author, sequential order of works, 
sequential order of references (when given), field entry number, 
and sequential order within field (when more than 80 columns have 
been used) . 

For an analysis of the kind for which this model was developed 
among the bibliographic characteristics it is only field 2 (ti- 
tles) that is of any concern. Field 7 contains information about 
number of pages, language used, document type, and the like. A 
detailed description of the works, the coding and control is 
given in B. Bierschenk (1979, Chapter 2 and Appendix) . But some 
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numerical data that may be of special interest will be given 
here. 

The number of works stored in the base is 949, of which 660 
(6 9.5 %) are written in Swedish. The English works number 24 9 
(26.2 %) , the German works 26 (2.7 %) , works written in other 
languages being represented by 14 titles (1.4 %) . 

The number of references cited is 23,141. Here, English is the 
dominant language, represented by 12,345 (53.4 %) titles, fol- 
lowed by Swedish, 8,865 (38.3 %} . German references total 1,153 
(5 %) , the figure for titles in other languages being 778 (3.3 %). 

In order to create manageable and uniform sets of data and for- 
mats for several kinds of studies, the document descriptions in 
the work base and the reference base have been sorted according 
to the language in which they are written. Field 7 provides such 
information about the works, and a program has been developed for 
automatic grouping of references according to language in the 
reference base (I. Bierschenk, 1978) . For testing the automatic 
coding of information in titles, the Swedish work base and the 
Swedish reference base were used. 

Since, for an investigation of titles, it may be of interest to 
examine whether there is any correlation between document types 
and titles, some variables relevant to such a description are 
presented below. Those variables for which values exist are list- 
ed in Box 6 . 



Box 6. Types of document represented in research on education 



Research report 


Article in daily or profes- 


Article in research journal 


sional press 


Monograph 


Preface 


Chapter in book, edited 


Symposium publication 


by someone else 


Bibliography 


Mimeograph 


Official Governmental Reports 


Textbook 


Read paper, invited address 



The document types "research report" and "mimeograph" can be 
distinguished in that reports refer mainly to project reports, 
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published in reviewed series, printed in various departments, 
whereas mimeographs refer to the kind of "grey reports" that does 
not have this formal status. 



5.2 Patterns in titles 

The coding rules presented in Chapter 4 were constructed after 
tests had been performed on the Swedish titles of the work base. 
After control of some initial tests, the demarcation rules were 
specified. For the construction of the structuring rules the de- 
marcations had to be edited in such a way that a full stop was 
surrounded by blanks. Continued testing then showed where other 
changes were needed. A character is not supposed to carry more 
than one function, which required editing discriminating between 
dashes and hyphens. A dash is regarded as having a demarcating 
function and so it had to be surrounded by blanks. Compare the 
following variation in coding: 



En studie av kreativitetsutvecklingen inom ars- 
(A study of creativity development within the 
40 30 33 



(19 



kurserna 4-9 
grades 4-9 



alternatively. . . 



4-9 
30 



In order to prevent the number 9 from being coded as a sentence 
of its own (which is the consequence when a concept is single) , 
a hyphen had to be inserted. 

Dashes also had to be exchanged for commas as in the following 
example: 

Larares erfarenheter fran forskola - lagstadium (20) 
(Teachers' experiences from pre-school - primary 
30 33 30 

- fritidshem 
school - centres for children's leisure activitie 
30 

This title is "incorrectly" constructed, and so the demarcation 
mark was exchanged for a character denoting connectivity. Thus, 
the coding should instead be: 



Larares erfarenheter fran forskola, lagstadium, 
(Teachers'* experiences from pre-school, primary 
30 33 33 

fritidshem 
school, centres for children's leisure activitie 
33 



(21) 
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Editing among non-alphabetic characters also concerned the colon, 
v/hose function is that of a demarcator . When denoting the geni- 
tive, as in "SIA;s organisation" (SIA's organization), it was de- 
leted, the genitive marker s thus being directly connected to the 
word stem instead. Further, a comma was replaced by an apostrophe 
when functioning as a decimal point {"betyget 2,3" (the grade 2.3) 
was changed into "...2*3"). Certain controls and editings were 
performed (interactively) via a terminal connecting the Depart- 
ment of Educational and Psychological Research in Malmo to the 
Computer Centre in Lund through UNIVAC's CTS (Conversational 
Timesharing System). Editing involving systematic changes, e.g. 
moving punctuation marks, was carried out automatically. 

After these controls and editings the rules were tested once 
again, and then the phrase dictionaries and the stop rules were 
specified. 

The controlling steps now concern the ability of the rules to 
operate correctly on the material. The content in the respective 
codes will be discussed in connection with their relevance to a 
thesaurus (Chapter 6) . 

In the following the patterns emerging from the titles will be 
presented. According to the analytical model the conceptualiza- 
tions may be more or less explicitly stated. In the most explicit 
case they are represented by the components 30 + 40 + (80) + 70, 
together with possible attributes. Thus a pattern is a structural 
representation of a conceptualization (see the demarcation rules), 
which implies that certain patterns are possible, common, uncom- 
mon, or impossible. Furthermore, there is an in-built restriction 
in the system, among other things due to the sequential order of 
the subordinate codes. 

For a quantitative description of patterns, "profiles" were 
printed containing all the existing types together with frequency 
counts. They showed that there are 85 different patterns, 4 2 of 
which (50 %) are unique. The latter types are not very suitable 
for a quantitative description of the material. Instead, the fo- 
cus of interest is on certain recurrent patterns, so as to make 
it possible to discover regularities, which is a prerequisite for 
the development of algorithms for automatic analyses. 

Therefore, it was decided that the reference base should be 
used as a control base. The work base is to a certain extent a 
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subgroup of the reference base, and so the pattern profiles were 
also counted on the reference base. The references represent 241 
patterns, 89 of which are unique. The number of different pat- 
terns in the references is higher than in the work base. At the 
same time, however, the reference patterns are characterized by 
a larger number of common features than the works, since only a 
third of them are unique. A comparison between the patterns in 
the two bases thus makes it possible to estimate, with some degree 
of confidence, the consistency of the general features of the 
work profiles. 

In order to determine the commonality of patterns in the works 
and the references, a lower limit was needed. The works have been 
produced by 4 researchers. If the same pattern occurs either 
four times in the works of one person or, conversely, once in 
each of four persons' works, this means a frequency of 10 % in 
the sample. Frequencies under 10 % may be regarded as random var- 
iation. Therefore, it was decided that only patterns with a fre- 
quency of 5 or more should be considered in the comparison. 

The patterns of the works were ordered according to their ranks 
of the references for the respective pattern. Spearman's rank 
correlation was calculated and found to be high (r g .89) . By 
means of criterion 5 the work patterns resulted in 19 distinct 
places of rank. Table 1 shows the result of this comparison. 

From Table 1 it is readily seen that to a great extent regular 
patterns exist. Moreover, the result is based on some 9,500 ti- 
tles, quite a high number in this kind of study. The first six 
patterns may be considered the most typical and the most predict- 
able. Variations are marginal. The first difference between the 
bases is the pattern at work rank 7, the second at rank 13.5, and 
the third at 15. 

The general result of the comparison is that differences in 
works and references may be apparent when there are more than one 
extension. It does not seem to be of importance whether the Meth- 
od component is activated or not. These patterns represent such 
specific titles as can be found in research reports and mimeo- 
graphs (cf . Box 7 in the following section) . That such works are 
not cited as often as books may, among other things, be due to 
the fact that they are not as accessible as are books. The pat- 
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tern can be exemplified with 






Rattstavningsformagans struktur hos pojkar 
(Correct spelling ability structure in boys 
30 33 

och flickor i arskurs 4 
and girls in grade 4) 
33 34 



(22) 









Table 1. Patterns in titles: comparison of ranks in work base 
and reference base 






Patterns 




Rank order 








Works 


References 


30 






1 


1 


30 33 






2 


3 


30 30 






3 


2 


40 30 






4 


4 


40 30 


33 




5 


6 


40 70 






6 


5 


30 33 


34 




7 


10 


30 30 


33 




8 


8 


30 30 


30 




9.5 


7 


30 33 


33 




9.5 


9 


40 80 






11 


11 


40 30 


30 




12 


13 


40 30 


33 


33 


13.5 


18.5 


40 30 


33 


34 


13.5 


12 


30 33 


33 


34 


15 


23.5 


30 30 


33 


33 


16.5 


16.5 


30 33 


33 


33 


16.5 


15 


30 33 


34 


34 


18 


16.5 


40 70 


70 




19 


20 






In view of the covariation of patterns within the reference 
base and the high correlation between the bases, only the title 
patterns in the work base will be studied further. The analysis, 
therefore, is focused on the patterns themselves in the context 
of the coding rules.. 
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Problems in the form of single concepts of the type shown in 
examples (15) and (16) in Chapter 4.3 constitute the most fre- 
quent pattern, followed by a combination of two Problem concepts 
or a Problem together with one extension. The more complex the 
patterns are, the less often they appear. A pattern simultaneous- 
ly activating all components in the analytical model does not 
seem to exist. Only two intentional components are activated in 
one and the same title, in the first place Method + Problem, in 
the second place Method + Goal, and in the third place Method + 
Instrument. Further, when activated, Method is always activated 
first. Problem is the only other component that can also be acti- 
vated initially. These observations lead on to the question of 
regularities within the patterns, regardless of frequency. For a 
study of the activation patterns of the components, a matrix was 
set up showing the sequential order of codes as seen throughout 
pattern types in the work titles. This transition matrix is given 
in Table 2. 




Table 2. Patterns in titles: transition matrix 



30 



33 34 



35 



36 37 



40 



73 74 75 76 



80 83 



84 



8586 



87 



JSL 






747 










233 








30 


241 


363 








2 


6 






33 




71 


77 






5 


4 






34 






17 


22 






1 






35 








6 


5 


1 








36 










2 


1 








37 




















40 


167 










13 


49 






70 


8 










1 


6 


17 




73 
















10 


8 


74 


















1 


75 




















76 




















77 




















80 














7 






83 














1 






84 














i 






85 




















8b 




















87 





















26 



14 




546 

270 

52 

15 

4 

1 



42 

9 

7 

1 



20 
9 
3 
1 



The matrix shows that only codes 30 and 40 appear initially. The 
concentration in the left corner indicates the dominance of the 
Problem component. It is also apparent that the Method component 
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is followed only by a main component, mostly 30. The main compo- 
nents are more frequently followed by an extension than by other 
main components. When Goal (70) is followed by 30, it may be 
assumed that a coordinate construction is involved, i.e. the 
Problem component initiates the following conceptualization, or 
that a concept with a double function is present (see example 
(49) , Chapter 6) . An extensional code is frequently repeated or 
followed by the following code in the sequential order; the lower 
its sequential number, the more often is this the case. This 
should imply that the more extensions a concept has, the more 
likely is it that no transition to a new main component is per- 
formed. This is also expressed in the final column, where it can 
be directly seen which components are more frequent in terminal 
positions than in other. Thus it should be noted that Method does 
not appear in terminal position, a logical consequence of the 
fact that Method is not preceded by preposition. 

This transition matrix may now be summarized and visualized by 
means of a graph. For this description the transitional proba- 
bilities (from Table 2) were calculated, a lower limit for the 
calculation being set at a row frequency of 10. It was further 
decided that only such proportions should be considered at the 
interpretation as were equal to or higher than .10. As a conse- 
quence, it is easy to distinguish the general features of the 
patterns. The graph is presented in Figure 5. 

The horizontally related nodes in Figure 5 indicate the inten- 
tion of the analytical paradigm, whereas the vertically related 
ones indicate extensions of the concepts. The broken-line nodes 
mark the place of possible arguments according to the model. The 
figures at the transition indicate the probability that a certain 
argument, when appearing, is followed by the next one. The arrow 
from a node back to itself indicates the reflexive functions of 
the model. 

Of all the possible patterns not many have been activated. The 
pattern with the highest probability is represented by the link 
between Method and Problem. There are also relatively high proba- 
bilities of occurrence within almost the entire Problem complex. 
("Complex" is here defined as "consisting of interconnected parts".) 
It then emerges that Goal is demarcated to a greater extent than 
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Instrument, but that the probability is higher that Instrument, 
when appearing, has an extension, although only one. Connectivi- 
ty, however, does not appear at Instrument. Further, Instrument 
"governs" Goal to a greater extent then Method "governs" Instru- 
ment or Goal . 




Figure 5. Graph description of titles 

As argued in Chapter 4.3, titles should reflect the activity 
of their authors. According to B. Bierschenk (1977), educational 
researchers express a desire for methodological intensification, 
but also prefer working with general problems. In other words, 
the problem orientation is obvious. The development of new meth- 
ods requires in many respects greater work intensity and time 
investment, which grant-supported research does not allow owing 
to the time limits imposed. This is probably also the reason why 
Instrument and Goal are not very often explicitly mentioned. De- 
marcating and defining problems in such a way that they become 
"researchable" requires so much of the project time that other 
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activities in the research process are often suppressed. This 
seems to be borne out by the patterns in the titles communicated 
by the researchers themselves. 

5.3 Degree of differentiation as a function of document type 

On the basis of the results accounted for in the preceding sec- 
tion, it appears natural to take a closer look at the works pro- 
duced by the researchers, i.e. an attempt will be made to deter- 
mine whether there are particular structures characterizing 
titles of specific types of documents. Since this type of research 
covers a wide range of activities, it should be expected that 
there is an interrelation between the construction of titles and 
the form of representation chosen. 

In Chapter 5.1 it was mentioned that the bibliographic informa- 
tion has been supplemented with non-bibliographic material, in- 
cluding, among other things, type of document. The types that are 
represented in the material and which will be used here have been 
presented in Box 6 . 

As a basis for the comparison, type of pattern according to 
Table 1 (Chapter 5.2) is used. The same limit has been chosen, 
i.e. the pattern must have a frequency of at least 5 throughout 
all works in the Swedish language. But in order to prevent the 
matrix from becoming too open, such structural relations can be 
employed as have crystallized from the transition matrix, being 
visualized in the graph. Thus a grouping should be performed. 

The first criterion for grouping concerns the two main patterns, 
namely the difference between Problem (30) and Method (40) . In 
this way patterns with and without 40 are distinguished. Then the 
patterns are analysed according to intentionality or extention- 
ality, indicating structural complexity at different levels. The 
40 type is process-oriented, the 30 type problem-oriented, which 
means that the former relates phenomena whereas the latter de- 
scribes and demarcates one and the same phenomenon. Further dif- 
ferentiation then results first in the groups 40 + 30, 40 + 70, 
and 40 + 80, representing explicit intentional relations between 
concepts. The corresponding pattern of the other type is a single 
30, since the problem orientation is characterized by implicit 
intention. Thus explicitly stated intentionality forms one main 
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group, and implicit intentionality forms the other. The degree of 
complexity is not assumed to increase with connective relations. 
This implies that combinations such as 40 + 30 + 30 and 30 + 30 + 
30 within the respective groups are allowed (see the variants in 
Table 1, Chapter 5.2). 

A further distinction is now necessary. It concerns the pres- 
ence of extensions within each main group. Among the patterns in 
the first group it can be seen that only the 40+30 type is 
followed by extensions. If the connectivity rule is to be follow- 
ed (as it should be), the pattern 40 + 30 + 33 is also formed. As 
a consequence the pattern 30 + 33 is given within the other main 
group . 

The degree of complexity grows according to the concentric 
principle. Thus one more pattern in each main group can be formed, 
namely 40 + 30 + 33 + 34 and 30 + 33 + 34, respectively. 

A comparison between document types will now be made with these 
groups as a starting-point. Eight groups could be discerned. They 
were ranked according to the frequency of the entire group. The 
result is shown in Table 3. 

Table 3. Proportions of pattern types for single document types 
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10 


11 
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If 


1 


30 




01 


02 


08 


14 


04 


25 


35 


06 


00 


01 


02 




03 


468 


2 


30 33 




01 


02 


10 


15 


03 


14 


40 


05 


01 


02 


01 




05 


206 


3 


40 30 




02 


03 


13 






13 


53 


02 






02 




12 


60 
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40 30 


33 






04 


07 




04 


71 












18 


45 
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30 33 


34 






06 


20 


03 




49 




03 




11 




09 


35 
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40 70 
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11 


03 


34 


17 


20 
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35 
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40 80 
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08 


38 


31 
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13 
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40 30 


33 34 
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The Table presents a ranking order of pattern types and a classi- 
fication of document types. If each document is regarded as a 
sample from the work base, it may be of interest to study which 
pattern is the most typical for the respective samples. In order 
to find these patterns it is appropriate to standardize the fre- 
quencies with which the patterns appear. Such a standardization 
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should be performed for each pattern, i.e. across the categories. 
The first pattern is thus estimated at 3/468 ... 1 5/468 . The method 
for determining the typical pattern is as follows: 

First the highest proportion for a certain pattern is looked up 
across the columns. Then it is determined for which column the 
pattern is the most typical. For example, the pattern 30 + 33 + 
34 is typical of the categories 4 and 11. But since .20 is a 
higher value than .11, it can be concluded that for category 4 
this pattern is the most typical. 

This criterion gives the following general features. The pat- 
tern type 30 cannot be said to be typical of any particular cate- 
gory. The fact that titles can display more than one pattern (cf . 
Chapter 4.3) may be the reason why this pattern type does not 
differentiate between categories. The type 30 + 33 is very fre- 
quent, too. A Problem component, single or together with one ex- 
tension at most, appears in all the document types and so does 
not function as a typical pattern representing a certain sample 
of documents. 

When it comes to 40 + 30, it can be considered a typical pat- 
tern. It is the most typical pattern in titles which are headings 
of chapters in a book edited by someone other than the author of 
the title. These are often books containing scientific papers. 
Titles of books produced by the 40 authors, each title having 
been produced by a single author, are characterized by other typ- 
ical patterns. The pattern 4 + 80 is typical of monographs, 
whereas textbooks are of the 40 + 7 type. The Monographs in this 
material include som theses whose basic feature is the explicit 
statement of the instruments and techniques used for the testing 
of a method. The textbooks investigated state explicitly for whom 
or what they are intended, i.e. the goal may be certain groups of 
persons, a grade in school, etc. 

Titles of "book types" typically contain a Method component + 
one more intentional component, which seems to vary systematically 
with document type . 

The 4 + 30 + 33 pattern is typical of research reports. Thus 
reports have, as opposed to the others, both intention and exten- 
sion explicitly stated in their titles. There are also reports 
with a higher degree of extensionality (the pattern 40 + 30 + 33 
+ 34.) This, however, is not the most typical situation according 



71 



to the method of determination employed here. A higher degree of 
extensionality is in general expressed in the titles of scientif- 
ic journal articles, but the type pattern is 30 + 33-1-34; thus 
there is no explicit statement of intentionality . 

Further, the proportions show that there are pattern similar- 
ities between research reports and mimeographs, between journal 
articles and Official Government Reports, and between monographs 
and textbooks. But according to the method used for the determi- 
nation of typical patterns, only five types are distinguished, 
differing from each other in so-called structural complexity. 
First, a group expressing explicit intentionality is determined 
through Method + one more main component. Second, a type pattern 
is determined having implicit intentionality and second-degree 
extensionality. Third, a further type pattern is found, charac- 
terized by explicit intentionality and first-degree extensionality, 
The pattern structure thus emerging is summarized in Box 7. 



Box 7. Structural variation in titles: complexity characterizing 
type of document 



Group 


Pattern type 


Document type 


1 


40 + 70 


Textbook 




40 + 80 


Monograph 




40 + 30 


Chapter in book 


2 


30 + 33 + 34 


Journal article 


3 


40 + 30 + 33 


Research report 



The structural variation in the titles, as resulting from the 
analysis performed, seems to be a consequence of the representa- 
tion form chosen. However, whithout detailed studies and experi- 
ments it is difficult to determine the extent to which the con- 
ceptual structures in the single groups correspond to the infor- 
mation they are intended to communicate. 

In the following chapter results from the automatic coding of 
the titles will be presented. Examples of various difficulties 
will also be given. 
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6. CONCEPTS IN FUNCTIONALLY RELATED REGISTERS 

In this chapter it will be demonstrated how the coding mechanism 
has worked out on authentic material. First the results of the 
operation strategy of the rules will be exemplified. Then the 
coding will be examined within the context of language structures. 
Finally, the concepts and their function in the registers that 
are to be generated will be presented and discussed. (For a def- 
inition of "register" in this context, see Chapter 1.) 

The presentation in the first section will be structured accord- 
ing to the groups that were distinguished and described in Chap- 
ter 5.3, But not all examples per group coincide with the docu- 
ment type of which the pattern is typical. Titles from other doc- 
ument types will also be included in the presentation in order to 
illustrate the underlying conceptual variation of the PmG model. 
As has been discussed in Chapter 4, a conceptual schema which 
does not require detailed analyses of different elements in the 
language is adopted here. It is important to keep this character- 
istic of the PmG paradigm in mind when studying the results of 
the analysis. 
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6.1 Coding of functional relations 

A significant result of the pattern structure analysis is that 
most of the structures are of the types 30 and 30 + 33, including 
connective structures, mainly the 30 + 30 variant. A single 30 
pattern has been exemplified in Chapter 4 (examples (15) - (17)) 
and will not be repeated here, The principles of the restricted 
coding, i.e. coding by means of paired numbers denoting super- 
and subordination within the concept complex, were also explained 
in Chapter 4. (24) below may serve as an illustration of a com- 
puter output with that type of pattern, as compared to (23) in 
which the concepts function connectively. 



Psykologin och samhallet 
(Psychology and society) 
30 30 



(23) 



Tonarsskola i utbildningssamha] le 
(Teenager school in educational society) 
30 33 



(24) 
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That (23) aims at discussing the relationship between psychology 
and society is certainly indisputable, and as long as the author 
of the title does not supply any further information the two 
phenomena should be treated together and thus be coded as two 
concepts of the same order. It is also likely that such a title 
may entail a discussion of society from a psychological perspec- 
tive, i.e. the title "Sarahallet och psykologin" (Society and 
psychology) could be as relevant as the original one. "Psykologin 
i relation till samhallet" (Psychology in relation to society) , 
however, would more clearly indicate the intended focus. In 
example (24) the extension (code 33) denotes the scope (context) 
within which the problem area tonarsskola (teenager school) is 
discussed. Since none of the titles within these patterns cause 
anv trouble and further demarcations connectivelv marked by oah 
(and) or comma do not change the result within these patterns, 
they will not be considered any further here. Instead, the de- 
scribed discussion will concentrate on the pattern groups de- 
scribed in the previous section (see Box 7) . 

The first group consists of a pattern type including Method and 
Goal (40 + 70), Instrument (40 + 80), or Problem (40 + 30), thus 
representing titles with explicit intentionality . Structurally 
these three are similar, so there is no ranking order here. The 
order of exemplification has been chosen according to the degree 
of explicitness in the model. 



Studieteknik for vuxna 
(Study techniques for adults) 
40 70 



(25) 



Mai for lararutbildning 
(Goals for teacher training) 
40 70 



(26) 



Nigra program for elektronisk databehandling 
(Some programs for electronic data processing) 
40 70 



(27) 



A typical concern in this population of researchers is to develop 
teaching materials and guides of several kinds. In general, they 
address themselves to explicitly stated goal populations (25). 
Vuxna (adults) is thus the group towards which the methodological 
work is directed. Another common activity is to produce goal 
descriptions (26) for various teaching and training purposes. The 
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concept mal (goals) as a representative of a method may seem 
strange. But mdlbeskrivning (goal description), i.e. the activ- 
ities involved in att beskriva mal (describing goals , to de- 
scribe goals) has a clearly methodological meaning within what is 
called educational technology. Here, the goal is to provide larar- 
utbildning (teacher training) with a set of operational ized goals. 
(27) is an example of a methodological activity with a more re- 
search-oriented goal. 

Some kind of instrument is often involved in the activity, even 
if not explicitly stated. Some such coding results are given 
below. 

Effektivare traning med videobandspelare (28) 

(More effective training with videotape recorder) 
40 80 

Forsok med tva olika typer av ordlistor (29) 
(Experiments with two different types of word lists) 
40 ' 80 

Familjeterapi med alkoholskadade foraldrar (30) 

(Family therapy with alcoholic parents) 
40 80 

These examples show different kinds of concepts in an instrumental 
function. Technical aids are typical instruments in education and 
teaching (28). Educational research has during the last 15 years 
been characterized as classroom experiments. As a model for test- 
ing and evaluation it has often used prototypes of teaching mate- 
rials (29) . Both instructional and educational instruments, as 
well as research instruments, are therefore to be considered 
means of attaining goals which in this context are implicit. 

Another method is representation of researchers involved in 
therapy (30) . In this example a group of persons functions as 
instrument. It can be noted that the title "Familjeterapi for al- 
koholskadade foraldrar" (Family therapy for alcoholic parents) 
would have implied that the persons themselves had been the ob- 
jective focused on (i.e. the goal population) . In example (30) 
the instrumental function indicates that the method as such (and 
its possible effects) is in focus. Alkoholskadade fdraldvar 
(alcoholic parents) have in the development and discussion of the 
therapeutic method the function of means. In empirical research 
the variables of analysis form the instrument itself. A title 
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like "Integrering av barn med handikapp" (Integration of children 
with handicaps) is thus an example of a med (with) phrase coded 
as being instrumental, and not associative (cf. the discussion in 
Chapter 4.3). 

Finally, some examples from the third variant in this pattern 
group are given. 

Matning av mental prestationsutveckling (31) 

(Measurement of mental performance development) 
40 30 

Reflexioner om vardagsinlarning (31) 

(Reflections on everyday learning) 
40 30 

Grunddragen av den svenska militara undervisningens (33) 
(An outline of the Swedish military education 
40 30 

histora 
history) 

Several kinds of measurement are connected with the more experi- 
mentally oriented part of this population of researchers (31). 
Mental prestationsutveokling mental performance development) 
is presumed to be a well-defined problem, since it can be meas- 
ured. What is meant by vardagsinldrntng (everyday learning) does 
not yet seem to lead anywhere further than to reflections. Here 
the choice of preposition is important. It is even more important 
in example (33) . If the author had used i (in) , the pattern would 
have been 30 + 33, and grunddragen (outline) would have been 
coded as problem. With knowledge of this particular author's 
field of activity it can be said that the method is to dra ut 
(lay bare) grunderna (the outlines) of the history of Swedish 
education. The concept grunddragen (outline) is a Swedish 
example of how events and single acts have been condensed and 
transformed into a label for a conceptualization that cannot be 
decomposed. Moreover, these three method concepts may be regarded 
as examples of three scientific approaches which very likely 
would not have emerged by means of other methods of analysis. 





The pattern type analysed has not caused any difficulties in 


the coding, neither in its 


single form nor in its compound vari- 


ants. 




The second pattern group 


consists of a pattern type including 
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problem and two demarcations (30 + 33 + 34) , which means a pat- 
tern representing titles with implicit intentionality and second- 
degree extensionality . 

Enkatsvar fran kiasslarare (34) 

(Questionnaire responses from school teachers 
30 33 

och klinikforestandare i arskurs 3 
and clinic directors in grade 3) 
33 34 

Pedagogiska problem vid undervisning av (35) 

(Pedagogical problems in the education of 
30 33 34 

s&rskoleelever 
disabled pupils) 

Vagen till och genom gymnasiet (36) 

(The way to and through the gymnasium (grades 
30 33 

i Sverige 
10-12) in Sweden) 
34 

The first example (34) reflects a common activity among the re- 
searchers, i.e. the use of questionnaires. In this case, however, 
the questionnaires themselves are the objective of the report, 
i.e. the responses, which have been carefully specified. Example 
(35) focuses on educational problems, not on the pupils them- 
selves. S&rskoleelever (disabled pupils) is used in this title to 
specify the problem. Example (36) emphasises the "pathway" fol- 
lowed in Swedish "gymnasial" (upper secondary school) studies. 

This pattern type displays greater complexity than the one dis- 
cussed earlier, implying that more rules are activated in the 
automatic coding (see Box 4, Chapter 4.3) and also that there is 
a risk of misinterpretation. For example, rule 13 has operated in 
title (35) . When the problem has been coded after vid (in) has 
operated, av (of) cannot point to a problem. The concept demar- 
cated by av (of) instead becomes subordinate to the first exten- 
sion. In title (36) a stop rule has operated. The coordination of 
two determinations through a combination of two prepositions is 
in the coding procedure performed in such a way that the first 
becomes a "pointer" (according to rule 5) . 

In this pattern group the procedure has resulted in only one 
coding error. Consider the following example: 
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Det fria tillvalet pa grundskolans hogstadium (37) 
(The free subject choice in upper comprehensive school 
30 33 

och vagen till gymnasiet 

and the way to the gymnasium) 

33(30) 34(33) 

The correct code numbers are given within parentheses. This title 
expresses two conceptualizations, connected by a conjunction. 
They are of the same type, i.e. 30 + 33. The coding process goes 
from left to right and the algorithm has not been able to handle 
this. The second conceptualization is not coded before oah (and) 
operates. Till (to) is then ordered according to the preceding 
concept and coordinated with the part preceding och (and) . This 
title may be compared with a correctly coded title which also 
shows relatively great structural complexity: 

En studie av kreativitetsutvecklingen inom (38) 

(A study of creativity development within 
40 30 33 

arskurserna 4-9 samt en undersokning av 
grades 4-9 and [also] an investigation of 

40 30 

kreativitetens samvariation med intelligens 
creativity's intercorrelation with intelligence) 

Rule 10, which states that only one main preposition of the av 
(of) type can be present in a clause, here defines "clause" ade- 
quately, preventing samt (and [also]) from being coded as a con- 
nector between arskurserna (grades) and undersokning (investiga- 
tion). The conceptualization after samt (and [also]) contains a 
main preposition, which according to rule 12 operates backwards 
(40 before 30) . 

Apart from example (37) no questionable codings have emerged 
within this pattern type. 

The third pattern group is exemplified through the most common 
structure of explicitly stated intentionality, expanded with 
first-degree extensionality, that is the 40 + 30 + 33 pattern. 
The type can be realized as follows: 






Matningar av sprakfardighet i tyska 
(Measurements of language proficiency in German) 
40 30 33 



(39) 
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Tva notiser om matning av forandring 
(Two notes on the measurement of change) 
40 30 33 

Urval av elever till teoretisk utbildning 
(Selection of pupils to theoretical education) 
40 30 33 



(40) 



(41) 



An interesting comparison can be made between (39) and (40). In 
(40) matning (measurement) is the objective of the study and does 
not indicate the activity itself, as opposed to (39) . Title (40) 
may be interpreted as if the problem matning av fdr&ndring (meas- 
urement of change) has a precise meaning for the author. Thus 
fdr&ndring (change) is not the problem; instead, av (of) has been 
"degraded" by om (on), i.e. a problem component has already been 
determined before av (of) operates. Contrary to the author of 
"Reflexioner om..." (Reflections on...) discussed above, the 
author of (40) gives the impression of having a well-defined 
problem area to deal with. Notes are common as a form of presen- 
tation when the content does not refer to an empirical investi- 
gation. This is, however, the case in (39), which suggests that 
the author reports on mdtningar (measurements) that have been 
performed. Title (4 0) , by contrast, indicates that the author's 
aim is to discuss certain aspects of measurement; he need not 
have performed any measurements himself. 

Title (41) gives an example of further activities among re- 
searchers and/or educational policy makers, namely the develop- 
ment and testing of selection techniques. The example is also 
interesting in that teoretisk utbildning (theoretical education) 
determines or governs elever (pupils) and not urval (selection) , 
which it might seem to do at first sight. The title should in- 
stead be interpreted as "urval av sadana elever som ar lampliga 
att tillhora den grupp som gar i teoretisk utbildning" (selection 
of such pupils as are qualified to belong to a group partici- 
pating in theoretical education) . In this case the preposition 
till (to) has been found to be the most adequate to express the 
function of "assignment to". If the title had instead been worded 
"Urval av elever fdr teoretisk utbildning" (Selection of pupils 
for theoretical education) , education would have been the goal of 
the selection, i.e. the pupils would be expected to educate them- 
selves in theoretical subjects. No such expectation is conveyed 
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by till (to) . 

There are no unsatisfactory coding results to be reported with- 
in this pattern type. 

As shown by Table 3 in the previous chapter, the pattern dis- 
cussed is typical of research reports. However, reports are so 
frequent in the material that other pattern types also appear in 
them. Therefore, some examples will also be given of titles show- 
ing still greater structural complexity. 



Anvandning av ITV vid undervisning i muntlig 
(Use of ITV at instruction in oral 
40 30 33 34 

framstallning 
presentation) 

Studier av sociala relationer mellan barn i 
(Studies of social relations between children in 
40 30 33 34 



(42) 



(43) 



folkskoleklasser 
elementary school classes) 

No coding problems have emerged within these pattern types . The 
"concentric principle" can be found to be at work in both (42) 
and (43), i.e. the outermost extension demarcates the nearest 
inward concept. 

An example of a title with a high degree of complexity which 
has been correctly coded is: 

Tva utredningar om relationerna mellan (44) 
(Two investigations concerning the relations between 
40 30 33 

brukare, forvaltare och byggare med sarskilt 
users, administrators and constructors with special 
33 33 34 

avseende pa barn och ungdom 
reference to children and young people) 

34 

This is a research report (not a Swedish "utredning" in 
the form of an Official Government Report) . Investigating is, 
however, a variant of research activity, and thus utredningar 
(investigations) is coded as being a method. Irrespective of 
whether the report is interpreted as being a kind of official 
investigation or a presentation of other investigations, it is 
representative of the differentiated activities within this group 
of researchers. Further, the example shows that the connective 
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functions have not increased coding difficulties and also that a 
multi-word preposition has operated correctly. 

Within this type a couple of questionable codings have appeared, 
namely 



Promemoria rorande ett forskningsprojekt 
(Memorandum concerning a research project 
40 30 

angaende generationsmotsattningar och upptagande av 
about generation gaps and adoption of 
33 40(33) 30(34) 

vuxenrollen 
the adult role) 

Preliminar redovisning av resultat fran en 



(45) 



(46) 



(Preliminary account 
40 



of results 
30 



trom a 
33 



nordisk utprovning av studiematerial och naringslivs- 
Nordic test of studv material and economic 

34 34(30) 

synpunkter pa innehallet i dessa 
views on the content in these) 
35(33) 36(34) 

The first example may be compared with (38) above. The coding has 
been processed in the same way after och (and) , but here it is 
not a disconnector. The correct coding within parantheses shows 
that two conceptualizations are not present. The algorithm does 
not function when the patterns on both sides of a connector are 
not in balance (in this case the elements are of different kinds). 

Example (46) is, on the whole, characterized by rather a high 
degree of imbalance. It connects, by right coding (see parenthe- 
ses) , two conceptualizations of different kinds, namely 4 + 30 + 
33 + 34 and 30+33+34. The problem concerning the second con- 
ceptualization seems to be the result of an ambiguous abstraction. 
Moreover, the use of pronouns leads to special difficulties in 
conceptual analysis, regardless of correct coding. 

Finally, the last two incorrect codings will be presented. They 
belong within a more complex pattern type than the ones just dis- 
cussed, since they represent a higher degree of intentional ity. 



Redogorelse for mStningar av samband mellan 
(Account of measurements of relatedness between 
40 30 33 

uppforande respektive ordningsbetyg och 
behaviour marks respectively discipline marks and 

33 40(33) 



(47) 



I 



amnesbetyg for elever pa hogstadiet 

subject marks for pupils at upper comprehensive school level) 
70 73 

En jamforelse mellan tva system for bedomning (48) 
(A comparison between two systems for evaluation 
30 33 70 

och betygsattning av fysikskrivningar i gymnasiet 

and grading of physics exams in the gymnasium) 

40(70) 30(73) 33 

In title (47) the main rule has operated at fdr (for) , preventing 
the balancing rule at och (and) from operating. The second exam- 
ple contains one more error, in that fysikskrivningar (physics 
exams) has been assigned a main code, a consequence of och (and) 
not being able to operate as a clause demarcator (according to 
rule 10) . 

Within the Goal complex fysikskrivningar (physics exams) con- 
stitutes the problem, while the report focuses on the comparison 
between the systems. This double function in the components of 
the Goal complex may be compared with the title below. 

En observationsteknik for bedomning av (4 9) 

(An observation technique for evaluation of 
40 70 30 

samarbetskarakteristika vid grupparbete 
cooperation characteristics in group work) 

33 

Only one problem is explicitly stated in this title, but the 
Problem component in this type of title has a double function, 
which mirrors the research process very well. 

There are a small number of titles of this type. (49) should be 
interpreted in such a way that an observationsteknik (observa- 
tion technique) is developed in relation to a certain problem, 
here samarbetskarakteristika (cooperation characteristics) . In 
such cases the problem is used instrumentally. Not until the 
technique has been developed may the beddmning (evaluation) of 
the problem take place. 

Further examples of this construction are "Ett system for klas- 
sificering av feltyper i diagnostiska skrivprov" (A system for 
classification of error types in diagnostic written tests) , "Ett 
attitydformular for studium av elevernas installning till skol- 
miljon" (An attitude form for the study of the pupils" attitude 
to the school environment) , and "Forskningsprogram for process- 
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analyser av arskurslost hogstadium" (Research program for process 
analyses of non-graded upper comprehensive school) . The activi- 
ties denoted as goals by these titles must, seen along the time 
dimension, be placed after the development of the method. In 
titles like (48) it would be quite feasible to denote, in a sec- 
ond step, this double function of the f&r (for) structure. But 
whether and in what manner this is done will be affected by the 
functional utilization of the register, which to a great extent 
will be an empirical question. 

The five titles presented above as being incorrectly coded make 
up all the errors in the material tested. Based on the numbers of 
patterns (n = 871), the proportion is .0057, i.e. no more than 
six titles out of thousand patterns have been coded incorrectly, 
due to the inability of the demarcation rules to handle the 
difference between connection and disconnection. 





6.2 Intermediate language functions 

With the representational function of language as a starting 
point, it was assumed in Chapter 3 that there exist different 
levels of representation as a consequence of the different trans- 
formational stages through which documents pass in content de- 
scription processes. This assumption also implies that different 
levels of representation are characterized by different degrees 
of abstraction. In this respect the intermediate language has 
been defined as the structural representation that should be 
used in a thesaurus for communication between author and informa- 
tion searcher. The starting point for the generation of this 
language is the organization of the titles. 

In general, intermediate languages have a higher degree of 
"formal logic" than natural languages. In contrast, the perfor- 
mance of the natural language is more dependent on "psycho-logic" 
(cf. Abelson & Rosenberg, 1958) than are languages of formal 
logic. The psycho-logic is represented by the way in which ab- 
stractions are formed and how they are conveyed and related in a 
language structure. To relate objects and concepts functionally 
seems to be the child's "psycho-logical" perception of its envi- 
ronment. Only as adults do humans "learn" to structure their en- 
vironment according to formal logic, which means that the natural 
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language is provided with formal classes (cf. Miller, 1967). 

The algorithm developed in the present study for the organiza- 
tion of concepts is based on assumptions of relations with a 
higher degree of formalization than natural language, i.e. the 
algorithm is based on intermediate language functions. The con- 
cepts are assumed to be part of cognitive structures which, be- 
cause of the degree of formalization in the titles, are possible 
to code automatically. The functional relations are here sig- 
nalled by prepositions. In the preceding section some coding 
results were demonstrated, in which the algorithm has generated 
unsatisfactory coding proposals in relation to expectations. The 
logic presupposed by the algorithm did not coincide with the 
structure of the title in those cases. It is the conjunctions 
that have caused trouble in that the proper discrimination bet- 
ween connection and disconnection has not been made. This implies 
that the functional role of the concepts has not been distin- 
guished either. In this section an attempt will be made to exam- 
ine whether the structural representation of the concepts in a 
title corresponds to the expected degree of abstraction. 

The concepts determined by prepositions and conjunctions are 
assumed to have such a construction at the intermediate level 
that they may, without their context, become functional entities 
in a register and represent the title from which they are gener- 
ated. From this point of view certain characteristic features in 
natural language cannot be accepted, such as inference and refer- 
ence. As pointed out in Chapter 4, relations in natural and in- 
termediate structures are expressed in different ways. A natural 
language is more concrete than the intermediate variety. This 
basic difference may be compared with the active - passive dimen- 
sion. An active way of writing is "close to things", i.e. close 
to what is to be described, while a passive way of writing in- 
creases the distance between the writer and what is to be de- 
scribed. Some examples of titles which may be discussed from this 
point of view are given in Figure 6. 

Within the context of the functions of the coded units and the 
operational procedure of the algorithm, certain aspects of the 
titles in Figure 6 call for some comments. 
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1 , Baskunskaper och basf Srdigheter - 



(Basic abilities and basic skills 
30 30 



sedda 



ur pers pektivet 



skolans overgripande mal. 



from the perspective of the school's general 



goals. ) 



seen 33 



2 . Projekt Ug . Ungdoro 



i Goteborg. Skolsegregation. Forekoms t och vissa effekter 



(Project UG. Young people in Goteborg. School segregation. Presence and certain effects 
30 30 33 30 30 30 



darav. 



thereof . ) 



3 „ Tjugosex ars u ppfo lj ning a v en grupp elever, enligt 

(Twenty six years' follow-up of a group of pupils so m avgatt 
40 30 who dropped out 33 

30 



folkskolestadgans 



paragraf 48. 



according to elementary school regulations' paragraph 48.) 



4 , Kamratbedomn i ng som validitetskr iteriu m och som medel 



(Peer evaluation as validity criterion and as means 
30 33 33 



att studera g ruppdynamiken. 



to study the group dynamics.) 



5. 



Att mata 



attityder till jamstalldh et. 



(To measure 
30 



attitudes to equality.) 
33 



Figure 6. Examples of structural levels in titles. 



Co 



The first title in Figure 6 has a little gap in the form of the 
participle sedda (seen) . This unit will be taken as an independ- 
ent concept, partly because of the demarcating function of the 
dash, partly because of ur (from) being the operator of a new 
unit. If, however, sedda (seen) had been eliminated (which would 
have been better) , the last unit would have been assigned the 
code number 30, since rule 4 would then have operated. To main- 
tain the relation between the super- and the subordination, the 
dash has to be eliminated. Thus the algorithm cannot handle this 
construction. There is no incorrect coding performed, however. 
Through the identification coding (see Box 5, Chapter 5.1) the 
elements belonging to the same title are not. lost either. 

The dimension denoted by sedda (seen) is neither active nor 
passive (cf. Mittelwort in German). At an intermediate level, 
however, the participle form is too "concrete" (it is implicit in 
the phrase ur perspektivet (from the perspective) ) . 

The second title gives, at the beginning, a staccato-like im- 
pression. Its last conceptualization contains a component from 
natural language, which requires reference. It is doubtful what 
ddrav (thereof) refers to, depending on whether, in turn, fore- 
komst (presence) refers to skol segregation (school segregation) . 
An anaphoric pronominal word is thus inappropriate within the 
context of the function that a title should have: the unit 
arranged under the Problem register would have the structure 
vtssa effekter ddrav (certain effects thereof) . However, the 
identification code links the unit to the right title. 

This staccato-like title is not suitable for any other kind of 
information search than that performed via keyword representation 
(i.e. a search for the presence of terms in the same title), 
since the author himself did not indicate how the three first 
conceptualizations are related to each other. 

Another example of reference is given in the third title, whose 
gap gives an even more concrete impression than the first two. 
With respect to the kind of pupils that the problem refers to, 
the second part of this title seems to be a little too long, 
since there now exists a specific term, studieavbrytare (drop-out), 
to designate this kind of pupil. Here, then, is an example of a 
more natural, i.e. more concrete, level. The generated unit som 
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avgatt (who dropped out) belongs to the above-mentioned active 
way of writing. Verb forms make for concreteness (of. Chapter 4). 
As soon as a phenomenon has been studied and defined, and thus 
incorporated in a certain conceptual structure, it can be commu- 
nicated in abstracted form. As further examples of this two ti- 
tles from another author may be examined. 



Forsoksverksamhet med nya former for samarbete 
(Experimental work with new forms for collaboration 



(50) 



40 



80 



melian studerande, larare 
between students, teachers 
73 73 

lararutbildningsanstalter 

teachers' training colleges) 



70 

och ovrig personal vid 
and other staff at 
73 74 



(51) 



Forsoksverksamhet med nya samarbets former vid 
(Experimental work with new collaboration forms at 
40 ' 80 83 

lararhogskolan i Malmo 

the School of Education in Malmo) 

84 

The first example is taken from a mimeographed paper produced in 
1969. Its goal is collaboration between different categories of 
staff, and in order to achieve this some experimentation with nya 
former (new forms), vaguely defined, is carried out. When a re- 
search report appears in 1972 (example (51)) the "forms" are more 
sharply defined and the author is able to form the concept samar- 
bets former (collaboration forms) . 

Such a relatively simple contraction, from the point of view of 
language structure, is more difficult to accomplish in example 3 
in Figure 6. The phrase paragraf 48-elever (paragraph 48 pupils) 
has been used at an intermediate stage. The reason why the pro- 
noun som (who) has not been interpreted as a prepositional som 
(as), in the operational sense, is the comma. The rules do not 
accept relative pronouns, but this case has nevertheless been 
correctly coded. 

In contrast to the relative pronoun just discussed, the fourth 
title shows a coding of a prepositional som (as), which does not 
cause any gap. The two 33 units are interrelated and assigned to 
the same register, but the second part is in imbalance compared 
with the first part. Validitetskriterium (validity criterion) is 
assumed by the author to be a communicative concept, as opposed 
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to gruppdynamiksobservationsinstrument (group dynamics observa- 
tion instrument) into which the second concept could be formal- 
ized. The cause of the imbalance is probably that the construc- 
tion with att (to) is used when a more general concept cannot be 
formed. Such a formation requires unambiguous relations to have 
been established in the research process, providing the basis for 
the meaning conveyed. The more explicit level of structuring 
characterizing this unit may require that the elements should be 
treated as kinds of elementary units, which calls for other algo- 
rithmic analyses than are necessary at an intermediate level (cf . 
Chapter 4 } . 

The final example in Figure 6 is a case where an att (to) in- 
finitive has been chosen instead of the verbal noun mdtning 

(measurement) . The infinitive form should instead be interpreted 
in such a way that it is part of the problem, i.e. the problem is 
att m&ta (measuring, to measure) attitudes and a discussion of 
how this problem area could be tackled. In this connection it 
may be noted that the same author uses a construction with att 

(to) having another function, namely in the title 

Att mata varldsmedborgaransvar med (52) 

(To measure world citizenship responsibility with 
40 80 

projektiva test 
projective tests) 

Here it is explicitly stated that a process takes place - the 
instrument employed is mentioned. The researcher states the 
approach taken, which does not have to be the case in the title 
in Figure 6. Att mata (measuring, to measure) is in this case 
more related to a research situation than in the example in the 
figure. The title may be interpreted as describing a stage in a 
construction process. From this point of view the coding algo- 
rithm may be said to capture the underlying meaning of the con- 
struction. 

The last two examples have such a high degree of formalization 
that a decomposition of the infinitive and the following unit 
would be possible in a second step, i.e. in such a way that att 
mata (measuring, to measure) is analysed as method in both cases 
and the second unit as problem. But since it can be assumed that 
authors formulate their titles very consciously, especially when 
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the title is higly formalized in other respects, it should be 
possible for an initial att (to) construction to enter a regis- 
ter without causing any difficulty. What the concept in example 

(52) says is that the methodological activity refers to test 
construction, which implies that it lies within the same func- 
tional domain as, for example, the concept intelligenstestning 

(intelligence testing) . Att mixta attityder (measuring / to meas- 
ure attitudes) as a problem concept is more like different kinds 
of attitudes: the same author has at a later stage used o&m- 
stalldhetsattitydev (equality attitudes). 

The principal difference between att mdta (measuring, to meas- 
ure) and m&tning (measurement) in this material seems to be that 
the verbal noun is used when the method is determined and the 
problem area is well-defined (cf. example (31) in Chapter 6.1). 
The infinitive belongs to titles which deal with preliminary 
stages. 

The titles discussed so far to some extent contain structures 
from a language which is more natural than is appropriate for the 
automatic coding of units, particularly when it is required that 
the units are to function as representations of concepts. There 
are also a small number of titles in the material which, accord- 
ing to the above discussion, are not characterized by an inter- 
mediate structure. 
Some examples : 



Vad ar pedagogik? 
(What is pedagogics?) 
30 



(53) 



Lonar sig utbildning? 
Pays education?) 
30 



(54) 



The typical feature of these titles is that they are complete 
sentences. The whole sentence is in these cases assigned to the 
Problem register, but no difficulties arise in their interpreta- 
tion. The difference between this type and a highly structured 
title (which is more abstract) is that the latter type is charac- 
terized by a greater distance in relation to the phenomenon dealt 
with, in comparison with examples (53, 54). Such a correspondence 
between structure and aspect could be utilized for the develop- 
ment of a structurally adapted algorithmic analysis. As an illus- 
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tration of the effect resulting when the logic in the coding al- 
gorithm and that in the title are not in agreement, two titles 
are given from the material, both in a natural language variant, 
reminescent of exchanges of words in a discourse situation. 

Sa lange vi har snus, knackebrod och (55) 

(As lona as we've got snuff, crisp bread and 
30 30 30 

fruntimmer, sa nog blir Sverige forsvarat 
women, sure Sweden will be defended) 
30 

Man kan ju faktiskt fa reda pa ett och (56) 

(You can actually get hold of one thing or 
30 33 40 

annat om tentan, forstas 
another about the exam, you see) 
30 30 

The first title would generate terms which would be unacceptable 
in a thesaurus for this research field. The second generates non- 
sense elements, whose functional relations cannot be processed by 
the proposed algorithm. However, the two titles have subtitles of 
a "normal" kind, which guarantee that the information conveyed 
can still be stored in the system. 

The type of titles discussed in the last section altogether 
makes up approximately 5 % of the material examined. This means 
that the coding algorithm and the titles have, on the whole, a 
common logic concerning the structure of the elements to be or- 
ganized in the registers. These automatically generated registers 
form the basis for the determination of the vocabulary in a the- 
saurus. Thus, well-functioning structures and concepts should be 
available in the titles. 

Titles of scientific documents are characterized by different 
patterns concerning the presence and sequence of order of the 
components, in this analysis referred to as structural complexity. 
This seems to co-vary with type of document (also called repre- 
sentation form) . The existence of gaps pointed out in this chap- 
ter is an indication of conceptual variation in the sense that an 
explicit "natural" structure reflects a "lower" level of trans- 
formation or abstraction. 

With this discussion and the analyses presented here serving as 
a general framework, a final description of what is intermediate 
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in documentary language will be proposed. For this purpose a 
sketch outlining the basic principles is given in Figure 7 in the 
form of a decision tree. For the sake of simplicity the relations 
between the different language types will be discussed in an 
either-or manner . The terminating \/ in the figure has a brack- 
eting function, i.e. "Indexing language" and "Retrieval language" 
should be considered dependent on each other. 



- 



Means of 
Commu nicat ion 



V&R 




ndex i ng 
anguage 



V = vocabulary 

R = rules 

NL = natural language 

AL = artificial language 

IL= intermediate language 



Ret r leva 
language 




I L 

Thesaurus 



Figure 7. Relations between language types 

For a means of communication to be called a language it has to 
have a vocabulary (lexicon) and a system of rules. If only one 
component exists or if instructions are missing as to how the 
vocabulary (V) and the rules (R) are to function together, this 
is called a "non language". As mentioned in Chapter 3, this 
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characteristic (V & R) is a basic feature of both natural (NL) 
and artificial languages (AL) . The principal difference between 
NL and AL is that AL requires more precise definitions and more 
general functions than NL does . 

The language of the titles that in the present material is to 
form the basis for the generation of an intermediate language 
(IL) must lie within AL"s functional domain. However, analyses 
have shown that certain features of NL appear in a few titles, 
which means that the degree of artificiality in the language of 
titles may vary. Features of NL may possibly be neutralized with 
the aid of indexing languages and indexers (cf. examples (55, 56) 
above, however). If, on the other hand, the title is going to be 
used in building up a retrieval language, more stringent require- 
ments have to be imposed on terminology and logic. 

The cognitive model that the coding mechanism presented in the 
present work is based on adds a new dimension to the information 
system, the relations between terms being automatically indexed, 
i.e. they are recognized in the construction of a retrieval lan- 
guage. The thesaurus then has a mediating function between in- 
dexing and retrieval. The title, too, has this intermediate func- 
tion, and so far as the title's conceptual and structural logic 
can be used for indexing and retrieval, its language is inter- 
mediate. 

In order to specify the intermediate language function of the 
planned thesaurus in more concrete terms, the following section 
will present a display from the generated registers with examples 
of the structure of the vocabulary. The function of the registers 
will also be demonstrated. 



6.3 Generated registers 

Based on the coding system, registers with different functions 
have been built up. The functions refer to the three fundamental 
components of the model. This means that the units assigned to a 
certain register have something in common. The common features of 
the units are defined by their having the same function. That the 
units have the same function does not imply, however, that they 
must necessarily be homogeneous in other respects. Obvious exam- 
ples of this have been demonstrated with the method function, 



where different aspects of "method" may be distinguished in the 
form of research methods, teaching methods, methods for reporting, 
etc. 

The work with the construction of the thesaurus, the next step, 
will not be further discussed here. But since the content, the 
function, and the proper inferences of the registers constitute 
the immediate basis for the development and testing of both the 
terminology and the retrieval mechanism, the purpose of this last 
section is to give an impression of what, in principle, the reg- 
isters contain and how they may function. A vocabulary study of 
all registers has been carried out (about 2,100 generated units). 
Gome different aspects (facets) can be distinguished. 

The Problems (register 30) typical of the field of educational 
research are of a general kind (discipline-oriented), e.g. peda- 
gog-ik (pedagogics) , psykologi (psychology) , edukation (systematic 
instruction) , fostran (upbringing) . This register also describes 
subfields with a wide range of meaning, such as begavning (abil- 
ity) , inlavning (learning), intelligens (intelligence), and pres- 
tation (performance) . Other, more teaching-oriented fields are 
sprakf&rdighet (language proficiency) and basfdrdigheter (basic 
skills) . A third problem type concerns problems within research 
itself, such as datainsamling (data collection) and testning 
(testing), etc. As pointed out earlier, the problem orientation 
is the most typical feature of this material. Often the problems 
are explicitly demarcated (register 33) . Typical extensions are 
localization in space (countries and places) and in time (ages 
and grades) . Further, the problems are often related to school 
forms, e.g. grundskolan (comprehensive school), gymnasiet (gymna- 
sium school), and to subjects of study. More general types, such 
as skolorganisation (school organization) and samhdlle (society), 
appear as well. The registers 34-37 contain similar units. For 
example, they denote school form, levels, subjects of study, and 
population of investigation, such as elever (pupils), flygf'dvave 
(aircraft pilots) , and skolledare (school principals) . These reg- 
isters have more in common with each other than they have with 
register 33. The closer to the main register its function lies, 
the more general terms it contains. 

This register complex can now be compared with Instrument and 
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Goal. The instrumental vocabulary concerns mainly different 
material-method systems, e.g. programmerad under visning (pro- 
grammed instruction) , tvasprakiga ordlistor (bilingual word 
lists) , or instruments employed in data collection, such as ob- 
sevvationsr (observations) , personlighetsschemata (personality 
schedules) , and videobandspelare (videotape recorder) . Extensions, 
when appearing, are here, too, of the localization type. The main 
aspects in the Goal register refer to persons and educational 
devices, e.g. dialekttalande elevev (dialect-speaking pupils), 
lavare (teachers) , fovaldvav (parents) , synsvaga (visually handi- 
capped [people]), psykiskt utveoklingsh&mmade (mentally retarded 
[people]), and lavavutbildning (teacher training), skolan (school), 
hdgve studiev (higher education), respectively. The extensions 
that exist (registers 73, 83) refer mainly to groups of persons, 
but also to school levels. Localizations are grouped within the 
other subordinate registers. 

The circumstances dealt with in this short summary justify 
still greater expectations as regards the contents of the regis- 
ters, with the examples of titles from the material already pre- 
sented. Finally, a study within register 40 yields the following 
features. 

The activities are to a great extent purely research-oriented, 
expressed by terms such as forskning (research) , analys (analysis) , 
design (design) , uvval (selection) , kavtlaggning (mapping) , undev- 
sSkning (investigation), and the like. Researchers measure ef- 
fects, construct tests and questionnaires, make data analyses, etc. 
But other expressions of activity are also numerous. For example, 
various things are dealt with and discussed in the form of funde- 
ringar (reflections) and nagra metodiska synpunkter pa (some 
methodological views on) ; or suggestions are presented as a 
forslag (proposal) or a pvomemovia (memorandum) . Investigations 
of educational matters and production of textbooks are also com- 
mon activities, e.g. utredning (investigation), vekvyteving 
(recruitment) , en handbok (a handbook) , lasteknik (study tech- 
niques) . 

Before the functioning of the registers is demonstrated, Box 8 
will provide a display from the registers 30, 40, 80, and 70, 
i.e. the ones that correspond to the main components of the model • 
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Bos: 8. Register display 


\ 


\ 




Problems (from register 


30) 






aggressivitet 




(aggressiveness) 




allmanbegavning 




(general ability) 




begavning 




(ability) 




begavningsreserven 




(the ability reserve) 




begavningsurvalet 




(the ability selection) 




en experimentell studie 


* 


(an experimental study) * 




individualism 




(individualism) 




inlarning 




(learning) 




instudering 




(studying) 




intelligens 




(intelligence) 




intelligensbegreppet 




(the intelligence concept) 




intelligenskrav 




(intelligence requirements) 




intelligenskvot 




(intelligence quotient.) 




intelligens standard 




(intelligence standard) 




intelligensalder 




(mental age) 




inatvandhet 




(introspectiveness ) 




kreativ utveckling 




(creative development) 




meditation 




(meditation) 




medvetandet 




(the consciousness) 




mognande 




(maturation) 




mStning * 




(measurement) * 




neuros 




(neurosis) 




personlighetspsykologiska faktorer 


(personality factors) 




person! ighet sutveckl ing 




(personality development) 




s j alvbedomning 




(self -evaluation) 




s j alvf orverkl igande 




(self-realization) 




sjalvstandighet 




( independence ) 




spec ialbegavn ing 




(special ability) 




temper amentslar a 




(theory of temperament) 




ett ungdomspsykologiskt 


problem 


(a juvenile problem) 




under givenhet 




(submissiveness) 




utha 11 ighet 




(tenacity) 




den utvecklingsharamades 




(the identity development 




identitet sutveckl ing 


of mentally retarded [people]) 




utatvandhet 




(extrovertness ) 




vardags inlarning 




(everyday learning) 





en 



* Example of functional dissimilarity 



Box 8 . (cont . ) 



" 



Methods (from register 


40) 




analys * 




(analysis) * 


bearbetning 




(processing) 


beskrivning 




(description) 


design 




(design) 


diskussion 




(discussion) 


ef f ektmatning 




(measurement of effects) 


en empirisk studie 




(an empirical study) 


erfarenheter 




(experiences) 


en experimentell studie 


* 


(an experimental study) * 


faktoranalys 




(factor analysis) 


£ under ingar 




(reflections) 


forsok 




(experiment) 


forsoksverksamhet 




(experimental work) 


granskning 




(examination) 


en hypotesprovande unde 


rsokning 


(a hypothesis-testing investigation) 


intelligenstestning 




(intelligence testing) 


kartlaggning * 




(mapping) * 


klassif icering * 




(classification) * 


konstruktion 




(construction) 


kvantitativa studier 




(quantitative stvidies) 


en longitudinell studie 




(a longitudinal study) 


metodutprovning 




(testing of methods) 


matning * 




(measurement) * 


psykologiska under sSkningar 


(psychological examinations) 


ref lexioner 




(reflections) 


resultatredovisning 




(account of results) 


sammanstalining 




(compilation) 


skola * 




(school) * 


standard isering 




(standardization) 


studier 




(studies) 


testkonstruktion 




(test construction) 


en uppfoljning 




(a follow-up) 


upplevelse 




(experience) 


urval 




(selection) 


utvardering 




(evaluation) 



U3 
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Box 8 



(cont . ) 
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Instruments (from register 


80) 




Goals (from register 


70) 


aikoholskadade foraldrar 


(alcoholic parents) 




analys * 


(analysis) * 


critical incident-metoden 


(the critical incidence 


anpassade 


(well-adapted [people]) 




method) 




bedomning 


(assessment) 


engelskundervisning 


(English language t€ 


;aching) 


dialekttalande elever 


(dialect-speaking pupils) 


grundlaggande matematik 


(basic mathematics) 




elektronisk data- 


(electronic data 


institutionsdemokrati 


(departmental democracy) 


behandling 


processing) 


intervjuer 


(interviews) 




elever 


(pupils) 


laromedelsframstallning 


(construction of teaching 


fackskolan 


(professional training 




materials) 






school) 


nagra modeller 


(some models) 




foraldrar 


(parents) 


observationer 


(observations) 




gymnasiala skolor 


(gymnasial schools) 


projektiva test 


(projective tests) 




hogstadiet 


(the upper comprehensive 


psykoterapi 


(psychotherapy) 






school) 


skolan * 


(the school) * 




kartlaggning * 


(mapping) * 


skolklinik 


(school clinic) 




klassif icering * 


(classification) * 


tva typer av iniarnings- 


(two types of learni 


.ng 


larare 


(teachers) 


material 


materials) 




lararkandidater 


(teacher trainees) 


utbildning 


(education) 




psykologer 

psykologutbildningen 

registrering 

samarbete 

skolan * 

studenter 

synsvaga 

tvasprakiga elever 

ungdom 

utvecklingsstorda 

vuxna 


(psychologists ) 

(psychologist education) 

(registration) 

(collaboration) 

(the school) * 

(students) 

(visually handicapped) 

[people] ) 
(bilingual pupils) 
(young people) 
(mentally retarded 

[people] ) 
(adults) 



Example of functional dissimilarity 






The units are listed in the authentically generated form, but 
without operators. 

The structures within the units in the registers will be close- 
ly studied in connection with the construction of the thesaurus. 
It will then be important to discuss the kind of "similarity" 
that exists between the terms for determination of facets. That 
discussion, however, will not be considered here. 

By now, the content in the register should not need any further 
comment. Examples with an asterisk demonstrate how identical units 
have more than one function. Kavtl&ggning (mapping) may be a 
method described in one title, but a goal in another; likewise, 
en ex-pevimentell studie (an experimental study) may be both 
method and problem. What these examples show, as pointed out in 
the analysis in Chapter 6.1 is that lexical forms need not be 
associated with the components in the model that correspond to 
those in a more linguistically oriented analytical paradigm. 
Words which give a concrete impression of a "thing" may thus 
function as "Verbs". This analysis distinguishes and brings to 
the fore dimensions a manually performed analysis would not have 
succeeded in doing, owing to habitually learned conceptions and 
classifications. This is examplified in Box 9 by the concept 
skola (school), which, as part of various compounds, has a high 
frequency in the material examined. 

Skola (school) as a problem is a subject of research, develop- 
ment, debate, and change. All this is expressed in the Problem 
register sko I organisation (school organisation). When skola 
(school) functions as an extension of a problem, it is the school 
form that is expressed. The school form is often the goal of an 
activity as well. The last two examples from the Goal register 
illustrate a textbook aspect of skola (school) . In an instrumental 
function, skola (school) is an aid; this function may be regarded 
as the social aspect of the school. 

Finally, it should be stressed that in fact the concept skola 
(school) also functions as method. In this sense, the school may 
be seen as a strategy by means of which one whishes to bring 
about a change in society. The Method component is here given a 
broader meaning, since the school may also be seen as an instru- 
ment. Method and instrument are components which can form method- 
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Box 9. Exemplification of concepts in functional registers. 



Skol organ-is at ■ ion (Sch 


ool 


organization) (30) 


Skol form (School form) (33) 


skolan 




(the school) 


skolan (the school) 


skolans socialisation 




(the socialization of 


grundskolan (the comprehensive school) 


skolans utveckling 




the school) 

(the development of the 
school) 


fackskola (professional training school) 
gyranasieskola (gymnasium.) 


skolans kris 
skolans sociologi 

skolniva 




(the crisis of the. school) 

(the sociology of the 
school ) 

(school level) 


den svenska enhetsskolan (the Swedish comprehensive 

school) 

fSrskola (nursery school) 

den obligatoriska skolan (the compulsory school) 


skol segregation 




(school segregation) 




Sko Iform/sko Istadium 


(School form/ school level) (70) 


Undervisningsorganisation (Instructional organization) (80) 


skolan 




(the school) 


skolan (the school) 


gymnasiala skoior 




(gymnasial schools) 


skolklinik (school clinic) 


skolvasendets utveckl 

grundskolans mellan- 
stadium 


ing 


(development of the 
school system) 

(intermediate school) 


Utbildningsmedel (Educational means) (40) 
sko la (school) 


de forsta arens rakne 
undervisning 




(mathematics instruction 
in the first school 
years) 




Fackskolan 2 




(Professional Training 
School 2) 








(instrument) -goal hierarchies in relation to the degree of com- 
plexity in the desired goals. In order to reduce method and in- 
strument to a single concept, the term "means" is used. 
Consider this title, written long before 1980: 



Skola f5r 80-talet 
(School for the 80's) 
40 70 



(57) 



In the light of the above discussion and analysis, the means-goal 
relation expressed in this title probably reflects what the 
author wants to communicate. Knowledge of the author's activities 
and field of inquiry in Swedish educational research can validate 
the interpretation proposed. 
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7. CONCLUSIONS 

Obtaining access to information is regarded as an ever-growing 
problem by an increasing number of people. Moreover, it seems 
that information is becoming more and more abstract. In view of 
this it is of great importance that methods and techniques can 
be developed which pave the way for a useroriented organization 
and re-organization of information. 

In the development within information science and related dis- 
ciplines there is a trend away from systems based on thinking in 
terms of classes, towards systems building on thinking in terms 
of functions. This change requires explicitly formulated cogni- 
tive models. Otherwise it will hardly be possible to attain the 
goal of re-organizing information. 

In the use of modern I & D systems, the thesauri will be of 
central importance for communication between the producer and the 
user of information. The thesauri have an intermediate function, 
which means that their language structure has to be focused on. 

The first description of a document that the user of an I & D 
system gets is the title, which is often the only description of 
the document. For this reason, titles have constituted the point 
of departure in the present study concerning the development of a 
method for the generation of an intermediate language. The title 
is supposed to be a condensed version of many empirical observa- 
tions as described in a document. For the purpose of analysing 
the relations between the components in a title, prepositions 
have been used. To be able to use the prepositions as pointers to 
concepts and conceptual relations, it is of basic importance that 
their ambiguity at this abstract level can be eliminated. 

In the analysis two kinds of prepositions can be distinguished: 
prepositions referring to intentions, and prepositions referring 
to extensions. The result of the analysis and the conclusion that 
prepositions have precise organizing functions have led to the 
development of an algorithm which makes possible a conceptual 
coding of titles and the generation of registers with functionally 
defined content. 

An analysis of the structure of titles shows that patterns can 
be detected that are typical of certain types of documents and 
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less typical of certain other types. The so-called type patterns 
are characterized by different degrees of structural complexity. 
This analysis of structure, moreover, can be validated by means 
of responses taken from an interview study. 

The algorithmic analysis of the concepts and the conceptual re- 
lations in the titles shows that a title which is characterized 
by structural assymmetry and which contains elements belonging to 
different levels of abstraction, seems to be in a state of imbal- 
ance. This imbalance leads to misinterpretation, i.e. coding 
errors. Finally, it should be pointed out that the algorithm 
identifies concepts and assigns them to the registers in a func- 
tionally more adequate way than would probably have been achieved 
by a normal analysis. 

The present analysis may be regarded as an initial attempt to 
study a problem area that could be described as "the re-cognition 
of highly abstracted information". A continuation of this line of 
research would have important implications for the organization 
and dissemination of information. It can be expected that in the 
decade to come, people will no longer ask only for the kind of 
information service currently available. Rather, what we will 
probably ask for are "solutions to problems". This development 
will confront information scientists, especially those who are 
oriented towards language processing, with new problems. It is 
most likely that the future will focus on such linguistically 
based analyses of structure as are aimed at simplifying the repre- 
sentation of abstracted information. 
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8. SUMMARY 

When the purpose of some research activity is to develop methods 
and techniques for a systematic analysis of language, as regards 
what information is to be conveyed, questions dealing with the 
structure assumed to characterize a message receive special em- 
phasis. In view of this it is of considerable importance that a 
theory can be developed which can explain cognitive phenomena as 
they appear through language. 

In this work attention has been focused on the development of a 
method and a technique that make possible the mapping of the cog- 
nitive structure through which the author of a title of a scien- 
tific report communicates what the report is meant to contain. 
The development of a model and a theory for such a purpose 
emerges as the basic problem for modern information and documenta- 
tion (I & D) systems. A dynamic structure should be the goal in 
the construction of the mechanisms on which I & D systems are 
based since information is characterized by the structuring and 
re-structuring of data, thus being subject to constant change. 

There are different forms of representation with different 
goals. A well-known and common type is made up of hierarchically 
constructed classification systems, implemented in libraries. Such 
systems have only a small potential for quick and easy adaptation 
to a particular structure which may be employed in a search by 
looking for new information. Once a report is put on a shelf it is 
bound to its. place. For the creation of a flexible organization, 
a dynamically functioning space is in a sense provided by facet 
classification. In such systems there are greater possibilities 
for lateral relations. 

Recent efforts to structure information involve networks and 
schemata. Whereas the former type aims at finding out how many 
"semantic primitives" are needed for a synthetic formation of 
concepts, the latter tries to explore the advantages of an adap- 
tively operating process. Schemata are based on an inferential 
strategy, which implies that only some or one of the components 
in the schema model may be activated. 

The starting-point in the present analysis is that an inter- 
mediate language can be generated from titles of scientific works, 
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and that such a language can be displayed in a thesaurus. An 
intermediate language capable of representing the cognitive struc- 
ture of a message has a higher degree of structuring than those 
which do not have this capacity. The cognitive structure of a 
field of science is conveyed through scientific documents. They 
contain different statements about empirical observations which 
may also be represented in sentence form, as in the N.vN- para- 
digm. If, however, the intentions, i.e. the underlying proposi- 
tions of sentences, can be denoted, another paradigm is required, 
a paradigm which accounts for underlying propositions at a some- 
what more abstract level. The paradigm adequate to this purpose, 
provided that the language under consideration belongs to the 
Indo-European family, is the Agent-action-Ob ject (AaO) paradigm. 
An aggregation of different observations and the causal relations 
that scientific reporting is establishing between different phe- 
nomena require a paradigm with the capacity to represent the 
scientific process and which, therefore, has to be even more 
abstract than the AaO paradigm. 

The fundamental components in a research process are "problem" , 
"method" and "goal". A paradigm which can express the result of a 
number of abstracted propositions is the Problem-method-Goal 
paradigm (PmG) . To study a title from the point of view of this 
paradigm builds on the assumption that there are cues in the 
overt structure of the title which indicate that the components 
represent the author's conceptualization. In order to analyse the 
relations between the single components in the title, the organi- 
zing function of the prepositions has been utilized in this work. 
The important advantage in the use of prepositions for a concep- 
tual analysis of titles is that they function unambiguously on 
the title level compared to the natural (concrete) language level. 
The concepts are related with respect to their intentions or ex- 
tensions, where such labels as localization and direction are 
generalized. 

To make possible automatic coding of concepts and conceptual 
relations,, a set of rules has been formulated. According to this 
set, a title consists of one or more conceptualizations, and so 
demarcation rules are necessary. Certain editings in the empiri- 
cal material have been made, e.g. by inserting demarcation markers 
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in the form of commas to indicate connection, and dashes and full 
stops to mark disconnection. The basic principle in the rule sys- 
tems is that the prepositions point forwards, which entails an 
algorithmic consequence, namely that when all elements after each 
intentional preposition have been determined, only the method is 
left. 14 rules operate by means of a dictionary consisting of 39 
operators, 12 stop phrases, and 5 conjunctions. The program is 
written in ASCII-FORTRAN. 

The algorithm has been tested on a data base which is represen- 
tative of Swedish educational research. The data base includes 
bibliographic descriptions, mainly according to the APA standard. 
For the evaluation a total of about 9,000 bibliographic descrip- 
tions of works written in Swedish have been used. 

According to the analytical model employed, the conceptualiza- 
tions may be extended to varying degrees. The most extended case 
is represented by a pattern which activates a Problem component, 
a Method component, an Instrument component, and a Goal component, 
together with possible extensions. Thus a pattern is a structural 
representation of a conceptualization. The most characteristic 
feature in the pattern shows that the Problem component appears 
single, connectively, or together with one extension. The more 
complex the patterns are, the less often they appear. This has 
been reflected in a directed graph (Chapter 5.2). There are links 
between Method and Problem, as well as within the Problem complex. 
Moreover, Instrument "governs" Goal to a greater extent than 
Method "governs" Instrument and Goal. This picture of the research 
process within educational research is in line with the problem 
orientation that the authors of the titles have expressed in an 
interview study. 

On the basis of the representational function of language, it 
has been assumed here that there exist several levels of repre- 
sentation, as a consequence of documents having undergone certain 
transformations. To generate an intermediate language which char- 
acterizes a document at a particular level of abstraction requires 
the concepts to have the same degree of abstraction. This means 
that certain features that are typical of a more concrete language 
'are not accepted by the algorithm. Such errors have been identi- 
fied and discussed. For example, one conclusion is that an inter- 
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mediate language in which prepositions functions as operators 
only allows constructions of a nominal type. 

Finally, it should be mentioned that the model described and 
operationalized has been employed in automatic identification and 
coding of dimensions that a manual analysis could have performed 
only with difficulty, since it builds on habituated frames of 
reference. This may be exemplified by a case discussed earlier: 
with knowledge of the author's focus of attention in the title 
"Skola for 80-talet" (School for the 80~s) , it is obvious that 
the concept skola (school) represents the Method component. That 
the school functions as a means for achieving social change in 
our society should not be difficult to agree with. 
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